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Clarity needed on data protection 


As a commendable European law on personal data comes into force, the research community must 
not let excessive caution about data sharing, however understandable, become the default position. 


protection for years, and scientists and universities — like every- 

one else across the continent — are about to see the results. 
Entering into force on 25 May, a new law known as the General Data 
Protection Regulation (GDPR), is designed to protect the personal 
privacy of citizens and will overhaul how personal data are collected, 
handled, processed and stored. It’s a welcome move to safeguard indi- 
viduals and is the biggest shake-up of data protection in more than 
20 years. 

However, as this journal has noted before, earlier drafts of the law 
posed a problem for science and the research community. Of particu- 
lar concern was the issue of consent — the draft language suggested 
researchers would be required to seek renewed consent to reuse data 
collected for a different purpose, which could have introduced delays 
and made some research impractical. But many in the research com- 
munity worked relentlessly to warn policymakers of the potential 
harm. In response, officials put in place rules that exempt research 
from some of the requirements, provided the proper safeguards are in 
place. Universities and organizations have introduced plans to make 
sure they are. The bulk of the work should be done. 

The passing of the final GDPR rules is, therefore, a good exam- 
ple of political engagement by researchers and their advocates, anda 
sensible and informed reaction from policymakers. Those involved, 
on both sides, deserve great credit. Harmonization of how data can 
be sourced, stored and used would, in theory, be good for research. It 
could smooth the difficulties that scientists face when they try to pool 
analysis of genomic data and tissue samples across national borders. 
Such sharing could help scientists to organize powerful trials with 
large numbers of participants. 

But although there is some cause for celebration, there are still out- 
standing issues. And that means that the same researchers and advo- 
cates must remain vigilant. 

The problem is that individual European countries have been left 
to decide some issues for themselves — for example, how scientific 
data can be processed. This flexibility is intended to allow countries 
to fit the rules around existing systems and different cultures, but it 
might leave nations out of step. Researchers who work under differ- 
ent systems could struggle to share data with each other. That could 
lead to delays in negotiations between institutions wanting to create 
collaborative contracts that enable data sharing. 

To help prevent this and to offer a unified approach, academics, 
industry representatives and patients have been meeting over the past 
year to distil the complex regulation into a user-friendly guide. This 
planned code of conduct aims to provide a simple ‘how-to’ guide for 
scientists, for example, by explaining differences in the way countries 
such as Germany and the United Kingdom define ‘anonymized’ data. 
The resulting Code of Conduct for Health Research, overseen by the 
biobank network BBMRI-ERIC (see J.-E. Litton Nature 541, 437; 2017), 
is almost ready for consultation. But meanwhile, medical research 


uropean policymakers have been discussing new rules on data 


remains vulnerable to unintended consequences of the new law. 
That's because, until the code of conduct is in place to offer clear guid- 
ance about how to comply with the GDPR, day-to-day decisions on how 
to interpret the law will be left to individual institutions’ legal depart- 
ments. It would be understandable if they chose to err on the side of cau- 
tion and place restrictions on sharing data for fear of breaking the law. 
Even when the code is finalized, it must still be approved by the 
European Data Protection Board (EDPB), 


“It’s important which has not yet said how organizations 
to ensure data can submit such codes for evaluation, or 
can be used how long the process will take. 

with integrity to Some have argued that delays in the 
supportvaluable code becoming available could be benefi- 


cial, because they would allow the research 
community to thrash out the details of this 
complicated area of the law. But others worry that if the process drags 
on too long, medical research will suffer. What starts as a cautious 
position on how best to share data in line with the law could drift into 
normal practice. 

That would be a missed opportunity and could risk undermining 
the good work done so far. Officials on the EDPB must not allow that 
to happen. The code must be approved and put into practice as soon 
as possible. It’s important to protect people’s personal data; but it’s 
also important to ensure data can be used with integrity to support 
valuable research. = 


research.” 


Climate costs 


A strong financial case for urgent action on 
greenhouse-gas emissions has now been made. 


the economics of the greenhouse effect’ is seen as the first attempt 

to model the economics of global climate change (W. D. Nordhaus 
Econ. J. 101, 920-937; 1991). Written by the economist Bill Nordhaus, 
the 18-page study assessed the costs of acting on emissions and the 
estimated costs of not doing so, and concluded that it was better for 
the economies of the world to try to address the problem than simply 
to give up and take the consequences. 

Economists and analysts around the world have repeated the 
exercise many times, most prominently with the British government's 
Stern Review in 2006 (N. Stern The Economics of Climate Change: 
The Stern Review; Cambridge Univ. Press, 2007). Almost all agree with 
the original conclusion: it will be much cheaper to spend the money 
on trying to curb emissions than to pay for the impact of the resulting 


Pp ublished in 1991, an academic paper called “To slow or not to slow: 
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climate change. But how much cheaper? There's the rub. 

A study in Nature this week offers the latest and most comprehensive 
attempt to address this question (see page 549). Marshall Burke and his 
colleagues modelled the impact of historical temperature changes on 
the gross domestic product (GDP) of 165 countries between 1960 and 
2010. They then ran the model to the end of this century to plot what 
would happen to GDP according to how much average temperature rise 
was expected. The results show that greater action to curb temperature 
will bring greater economic benefits — which are, in reality, avoided 
damages, measured as the impact on GDP. 

Specifically, there is a 75% chance that keeping global warming to a 
1.5°C rise — an aspiration of the Paris climate treaty — will leave the 
world better off than letting it run to a2°C rise. The probable savings: 
a cumulative US$20-trillion increase in world GDP by the end of the 
century. (Global GDP in 2016 was about $76 trillion.) 

It’s fair to say that not all of the world’s economists and climate- 
policy wonks will be content to take the conclusions of the study at 
face value. Details matter, not least — as with all models — the kinds 
of assumptions made and data used. The Stern Review, for example, 
was quickly queried by economists who criticized the way in which it 
borrowed from the insurance industry and placed great importance 
on the needs of future generations, who are usually discounted in 
models of economic impact because it is assumed that they will be 
considerably richer and so better able to deal with problems than are 
today’s generation. 

In that spirit of debate, this week we also publish two — conflict- 
ing — opinions on this study in a News & Views Forum (see page 498). 
It provides a glimpse of the debate already raging in the economics 
community. One point of contention is how fair it is to simply extra- 
polate from past trends into the future. As finance experts are keen 


to point out, past performance is not always a reliable guide to future 
yields, and in this case it could be that the people of the future will 
find ways to adapt to a changing climate that are not accounted for 
in the model. Such adaptation — the development and widespread 
introduction of drought-resistant crops, for instance — would offer 
a significant saving, because food prices would not increase so 
dramatically if harvests are protected. 

Another feature of the new model is that it 


“Our burning assumes that climate change, and the extreme 
of fossil fuel weather it is expected to bring, will have a 
is writing compound impact on the rise of a nation’s 
chequesthatour GDP. Thus, a devastating storm or washed- 
economy can 't out summer would affect not just that year’s 


economic performance, but also its perfor- 
mance in subsequent years. Previous studies 
have taken a more optimistic view that any damage could quickly be 
compensated for. 

There is amore fundamental issue, too. Just how reliable is GDP asa 
metric? Famously, it assumes that the market price of goods and services 
fully reflects the costs of their production and use. And the economics 
of climate change dontt always do that: the price of fossil fuels, for one, 
doesn't take account of the costs associated with future warming. 

Like all models, these economic projections will be argued over, 
worked on and ultimately improved. Scientists can gather the data to 
help that process, for example by expanding studies of the regional 
effects of climate change to poorer nations that are already bearing 
the brunt of the physical and economic impacts. Meanwhile, the argu- 
ments for acting on greenhouse-gas emissions, already many and 
varied, just got a little stronger. Our burning of fossil fuel is writing 
cheques that our economy cant afford to cash. = 


afford to cash.” 


Road to nowhere 


Electric cars are gaining ground fast but face 
fossil-fuel favouritism in the showroom. 


ho killed the electric car? According to the 2006 documentary 
W: that name it was the automobile companies, and especially 

General Motors (GM), which produced, and then recalled 
and crushed, thousands of its pioneering EV1 model in the late 1990s. 
Arguments still rage about the company’s true motives (GM insists it 
was down to high costs), but two decades on from the EV1 with its niche 
appeal, it’s clear that reports of the death of electric vehicles have been 
greatly exaggerated. Sales in some places are booming. Figures from the 
Centre of Automotive Management in Gladbach, Germany, show that 
nearly half of the new vehicles registered in Norway during the first three 
months of this year were electric. During the same period, China sold 
more than 142,000 electric vehicles — still just 2% of the total numbers 
sold, but a large increase on last year. 

What drives these sales? According to a study published this week in 
Nature Energy (G. Zarazua de Rubens, L. Noel & B. K. Sovacool Nature 
Energy https://doi.org/10.1038/s41560-018-0152-x; 2018), it’s not the 
sales staff who work at car dealerships — at least not in most nations 
in Scandinavia. “Do not buy this, it will ruin you,’ one prospective 
buyer was told when they asked about an electric car on sale. “Another 
would-be customer was gently steered away from an electric model 
because, the sales person wrongly insisted, it would take two days to 
drive 350 kilometres — roughly the distance between New York City 
and Washington DC. 

We know this because, in these cases, the customers had no 
intention of buying a car — electric or otherwise. They were under- 
cover university researchers, indulging in a little ‘mystery’ shopping to 
test industry attitudes and the barriers that remain to the widespread 
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adoption of new technologies. In this case, the attitude of the sales 
staff — largely driven by them not knowing as much about the electric 
models — was hugely influential. The study analysis suggests that it 
is the most important predictor of the likelihood that a customer will 
leave having bought an electric car — which the researchers calculated 
was a dismal 0% in many of the cities they visited. 

In all, the researchers underwent 126 shopping experiences in 82 car 
dealerships across Denmark, Finland, Iceland, Norway and Sweden. 
(The ethics of mystery-shopping exercises have been questioned — they 
waste the time and money of the targets — so the researchers did not 
spend more than about ten minutes talking to the sales staff in each 
case.) They conclude that dealers were dismissive of electric vehicles 
and misinformed shoppers about vehicle specifications. In many cases, 
it took persistent questions from the mystery shoppers to get the electric- 
car dealers just to admit that yes, they did actually sell electric cars. 

Why would car sales staff make it so difficult for customers to buy 
a car? Because they want them to buy a different kind of car. As the 
researchers point out, dealers “strongly oriented customers towards 
petrol and diesel vehicle options” on sale alongside the electric 
versions. And that behaviour is typical. The researchers argue that 
the attitude “mirrors industry and government favouritism towards 
conventional cars”. 

Why does this matter? Electric cars are an important strategy for 
sustainable transport and have reached the point where sales to early 
adopters must start to give way to sales to a larger “early majority” 
(J. Lynes Nature Energy https://doi.org/10.1038/s41560-018-0173-5; 
2018). There is a well-known and much-feared chasm between the two 
stages, and one that policymakers are trying to bridge with incentives 
such as subsidies, investment in infrastructure and privileged access 
to road space (such as allowing electric cars into lanes banned to other 
cars carrying no passengers). The mystery-shopping study highlights a 
new and important part of the bridge. Attitudes and incentives in deal- 
erships must be changed — even simple steps such as better training 
and offering higher commission on successful sales of electric vehicles 
could help. = 
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WORLD VIEW 


proposed a new rule to “ensure that the regulatory science 

underlying Agency actions is fully transparent, and that under- 
lying scientific information is publicly available in a manner sufficient 
for independent validation”. The alleged justification is a crisis in 
science over replicability and reproducibility. 

At face value, the proposal might seem reasonable. It isn’t. 

Many EPA watchers believe that the rule targets long-term 
epidemiological studies that linked air pollution to shorter lives and 
were used to justify air-quality regulations. In my view, the rule could 
keep that and other high-quality evidence from being used to shape 
regulations, even if there are legitimate reasons, such as patient privacy, 
why some data cannot be made public. It could potentially retroactively 
exclude an enormous amount of respected evidence. This would make 
the EPA less able to serve its function “to protect human health and 
the environment” The window for speaking up 
is closing fast. 

There is a crisis in US science, but it is not the 
one claimed by advocates for the rule. The crisis 
is the attempt to discredit scientific findings that 
threaten powerful corporate interests. The EPA is 
following a pattern that I and others have docu- 
mented in regard to tobacco smoke, pollution, 
climate, and more. One tactic exploits the idea of 
scientific uncertainty to imply there is no scien- 
tific consensus. Another, seen in the latest efforts, 
insinuates that relevant research might be flawed. 
To add insult to injury, those using these tactics 
claim to be defending science. 

A previous attempt to restrict the use of vetted science was the 2001 
Data Quality Act, which resulted in guidelines for how information 
could be used and disseminated. The Competitive Enterprise Institute, 
a think tank in Washington DC dedicated to limited government, was 
quick to invoke it to try to prevent the distribution of a major EPA 
report on climate change. 

Those lobbying for data that underlie regulations to be publicly 
available have not made similar demands for other data, such as the 
composition of fracking fluids, or the information confidentially 
supplied by companies to register pesticides with the EPA. 

Guests present when the EPA administrator unveiled the rule 
included US Congressman Lamar Smith (Republican, Texas), who has 
repeatedly introduced legislation to exclude research justifying the US 
Clean Air Act; Myron Ebell of the Competitive Enterprise Institute, who 
has long challenged the scientific consensus on climate change; and 
lawyer Steve Milloy, who also disputes anthropogenic global warming 
and has long ties to the tobacco industry, which floated similar proposals 
in the 1990s to try to thwart regulation of second-hand smoke. 

Conspicuously absent were the scientific organizations that are 
working to improve data and transparency in research. Three promi- 
nent journals (Nature, the Proceedings of the National Academy of 


L: month, the US Environmental Protection Agency (EPA) 


ROBUST 


SCIENCE IS BEING 


CHALLENGED 


AND NEEDS TO BE 
DEFENDED. 


A personal take on events 


Transparency rule 
is a Trojan Horse 


The US Environmental Protection Agency is co-opting scientific trappings 
to sow doubt, warns Naomi Oreskes. 


Sciences, and Science) issued a joint statement condemning the rule, 
even though new policies at the journals had been used to justify it. 

One week before the proposed rule was announced, a group called 
the National Association of Scholars — which uses an acronym easily 
confused with that of the prestigious National Academy of Sciences — 
released a report called “The Irreproducibility Crisis of Modern Science’. 
The association (I will not call it the NAS!) describes itself as dedicated 
to “academic freedom and disinterested scholarship” and has focused 
mostly on critiquing undergraduate courses. The report dwells fre- 
quently on climate science, yet the greatest concerns among scientists 
over reproducibility relate to biomedicine and psychology. The asso- 
ciation’s president, anthropologist Peter Wood, has compared climate 
scientists to circus hucksters. Smith championed the report's launch. 

I urge the scientific community to get out ahead of efforts that I 
believe are intended to exploit discussions about reproducibility and 
transparency for political ends. For starters, 
researchers should recognize that the term ‘regu- 
latory science’ as used in the rule does not carve 
out some separate category of work commissioned 
by government agencies. It applies to all science. 

The geochemists, hydrologists and forest ecolo- 
gists who worked out the cause of acid rain in the 
1960s and 1970s did not set out to study air pol- 
lution. They did not think of themselves as envi- 
ronmentalists, although they became embroiled in 
a public-policy debate. If regulators had ignored 
ground-breaking papers, they would not have 
acted to control acid rain. Something similar can 
be said for the atmospheric chemists in the 1970s 
and 1980s who realized that chlorofluorocarbons were depleting the 
ozone layer. That led eventually to the Montreal Protocol, once decried 
by industry groups — now hailed as a success. 

Even the EPA’s own science advisory board has protested that the rule 
seems to have been designed without input from the scientific commu- 
nity, that public access to data from older studies might not be feasible, 
and that consideration must be given to confidentiality and privacy and, 
for complex existing data sets, the cost and effort of making them acces- 
sible. The rule also fails to take into account extant government mecha- 
nisms for vetting science or ways to conduct independent re-analyses 
without publicly releasing subjects’ personal data. 

Robust science is being challenged and needs to be defended. 
Scrutiny is appropriate. Making it more difficult to apply science to 
governmental regulations is not. 

The comment period on the proposed rule closes soon (30 May). 
Scientists can submit comments at go.nature.com/2i0iz46. Do. m 


Naomi Oreskes is a historian of science at Harvard University in 
Cambridge, Massachusetts, and co-author of the book Merchants of 
Doubt. 

e-mail: oreskes@fas.harvard.edu 
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POLICY 


Data dissent 


A panel of science advisers 

to the US Environmental 
Protection Agency (EPA) has 
sharply criticized a proposed 
rule that would limit the 
agency’s use of research based 
on underlying data that are 
not publicly available. Ina 

12 May memorandum, a 
working group of the EPA 
Science Advisory Board said 
that the proposed rule could 
have broad ramifications and 
should be submitted to the 
full advisory board for review. 
The working group said that 
the Trump administration 
failed to explain the benefits 
of the proposal, did not 
properly assess its potential 
regulatory impacts, and failed 
to answer practical questions 
about its implementation. See 
page 469 for more. 


Asset uncertainties 
The Australian Academy 

of Science is concerned 

about the lack of detail in 

the government's plan to 
spend Aus$393 million 
(US$297 million) on research 
facilities over the next 5 years. 


ANNOUNCEMENT 

Following discussions at the 
World Health Organization 
last week, it has become clear 
that the outbreak of Ebola in 
the Democratic Republic of 
the Congo requires prompt 
responses by the research 
community to help minimize 
the considerable risks of 
spread. 

Accordingly, Nature’s 
editors and those of other 
appropriate Nature and 
Nature Research journals will 
treat relevant submissions 
with priority. On publication, 
Nature’s publishers will make 
the research freely available 
for a period subject to future 
review. 


UK eyes EU science scheme 


The UK prime minister has declared that she wants her 
country to be part of the European Union's next major 


research-funding programme, which will launch after Britain 
leaves the EU in March 2019. In a speech on 21 May, Theresa 
May (pictured) said the government would like the option to 
“fully associate” itself with Horizon Europe, which will run 
from 2021 to 2027 and will be the European Commission's 
largest-ever science-funding scheme, with a planned budget 
of almost €100 billion (US$118 billion). May said that in 
return for paying to take part, she wanted to influence the 
shape of the scheme. Thirteen non-EU countries are associate 
members of the current programme, Horizon 2020, and pay 
a proportion of their gross domestic product so that their 
scientists can bid for European grants. But none of these 


nations has input on the programme’ priorities. 


On 15 May, the government 
announced its research 
infrastructure investment 
plan. Money will be spent 
on infrastructure such as 
next-generation microscopes 
and imaging equipment, 
advanced nanoscale 
fabrication facilities and 
Australia’s nuclear-research 
capacities. But the academy 
is concerned that there 

is limited detail on when 
money will be available, and 
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that crucial infrastructure 
upgrades could be several 
years away. 


| ERESEARCH 
Heart regeneration 


Japanese physicians are poised 
to conduct the world’s first 
clinical study using induced 
pluripotent stem (iPS) cells 

to treat heart disease. On 

16 May, a health ministry 
panel approved a plan by 


Osaka University doctors to 
use the cells to treat ischemic 
cardiomyopathy, in which 
reduced blood flow to the 
heart compromises the 
organ’s ability to pump blood. 
Heart-muscle cells created 
from iPS cells will be grown 
in 0.1-millimetre-thick sheets 
that are expected to release 
growth factors that will 
regenerate the heart muscle. 
iPS cells have been used in 
several trials to treat retinal 
disease in humans, but at least 
one of those has been halted 
after a patient had an adverse 
reaction to the transplant. 
The heart trial is expected 

to start in three people by 
March 2019. 


Volcano explosion 


The summit crater of Hawaii's 
Kilauea volcano exploded on 
17 May, sending a plume of 
ash about 9 kilometres into the 
air. The explosion happened 
after two weeks of increased 
activity, during which lava 
had spewed through volcanic 
fissures and triggered 
earthquakes that reached 
magnitude 6.9. The volcanic 
activity has damaged at least 
36 structures and prompted 
more than 1,800 evacuations. 
Volcanic air pollution has 
been reported in the southern 
part of Hawaii’s Big Island. 
The US Geological Survey 
warned the eruption could 
become more violent and eject 
“ballistic” rocks up to 2 metres 
in diameter. See page 477 for 
more. 


Study on hold 

The US National Institutes 

of Health (NIH) has halted a 
US$100-million study into the 
health effects of drinking, after 
news reports suggested that 
the alcohol industry might 
have funded it improperly. 

On 17 May, NIH director 
Francis Collins said that he 
was putting the study on hold 


PETER POWELL/EPA-EFE/REX/SHUTTERSTOCK 
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&L. JIANG SCI. ADV. 4, EAAR2133 (2018). 
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while his agency investigated 
allegations that its officials 
had courted funding from the 
alcohol industry in violation 
of NIH rules. The Moderate 
Alcohol and Cardiovascular 
Health trial was being run 

at Harvard University’s 

Beth Israel Deaconess 
Medical Center in Boston, 
Massachusetts, and intended 
to recruit around 7,800 adults 
for a 10-year study into 
whether a single drink per day 
could confer health benefits. 


UK visas refused 
More than 3,500 scientists 

and medical and engineering 
professionals were refused visas 
to work in the United Kingdom 
between December 2017 and 
March 2018. Applicants were 
refused despite being eligible 
because the government had, 
for the fifth month in a row, 
breached its cap on a class 

of visa for skilled workers 
knownas Tier 2. Because 
candidates for PhD-level roles 
were prioritized for visas, 
academic scientists were largely 
unaffected. The government 
released the figures on 16 May 
in response to a freedom-of- 
information request. 


Po SPACE 
Observatory pact 


Two rival telescope 
organizations have 

joined forces to give US 
astronomers broader access 
to the next generation of big 


TREND WATCH 


Two-thirds of scientists share 
results outside their trusted 

circle before formally publishing 
them, a survey of more than 
7,000 researchers finds. Social 
scientists and mathematicians are 
most likely to disclose findings 
before publication, and computer 
scientists least likely. About 40% 
of scientists disclose a result 

after they are sure of its validity, 
and 21% share findings once 
they've written or submitted a 
manuscript. The respondents 
worked in nine fields in the United 
States, Germany and Switzerland. 


ground-based observatories. 
The partnership involves the 
Giant Magellan Telescope 
Organization (pictured, 

one of its mirrors), which 

is building a 24.5-metre 
telescope in Chile, and the 
Thirty Meter Telescope 
International Observatory, 
which aims to construct 

in either Hawaii or the 
Canary Islands. They, along 
with the National Optical 
Astronomy Observatory, will 
encourage US astronomers 
to propose collaborative 
research programmes for the 
extremely large telescopes, 
which the US National 
Science Foundation could 
consider funding. 


Energy appointment 
US President Donald Trump 
nominated Christopher Fall 

to lead the Department of 


SHARING WITH OTHERS 


Energy’s Office of Science on 
18 May. Fall is currently the 
principal deputy director of 
the department's Advance 
Research Projects Agency- 
Energy. Before that, he was 
the acting chief scientist at 
the Office of Naval Research 
and assistant director for 
defence programmes at the 
White House Office of Science 
and Technology Policy. Fall 
has a PhD in neuroscience 
and a master’s in business 
administration. At the Office 
of Science, he will oversee a 
research budget of around 
US$5.4 billion. 


Russian minister 


The Russian government 
has divided its science and 
education ministry in a 
shake-up after national 
elections in March, which 
handed Vladimir Putin his 
fourth term as president. 
Prime Minister Dmitry 


A survey of more than 7,000 researchers finds that about 67% of 
scientists disclose their results outside their circle of trusted colleagues 
before formal publication — but when varies widely by field. 
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SEVEN DAYS | THIS WEEK | 


Medvedev appointed the 
government's new cabinet 

on 18 May, putting Mikhail 
Kotyukov in charge of the 
newly created Ministry of 
Science and Higher Education. 
Olga Vasilyeva, a church 
historian who became head 
of the now-defunct Ministry 
of Science and Education 

in 2016, will head the new 
Ministry of Education. 
Kotyukov was previously 
chief of the Federal Agency 
for Scientific Organizations, 
which was established in 2013 
and reported directly to Putin. 
The agency, which will now be 
dismantled, was controversial 
because it was given effective 
control of the Russian 
Academy of Sciences. 


HEALTH 


Ebola vaccine 

Health officials in the 
Democratic Republic of 

the Congo (DRC) began to 
distribute an experimental 
vaccine against Ebola on 

21 May, as an outbreak of 

the infection continues to 
worsen. Forty-six people had 
been infected with the virus 
and 26 had died as of 21 May, 
according to the World Health 
Organization. An expert 
committee assembled by the 
agency says that the health risk 
from Ebola in the DRC is now 
“very high’. See page 475 for 
more. 


Migraine medicine 
The US Food and Drug 
Administration has approved 
the first in an emerging class 
of drugs intended to prevent 
migraines. On 17 May, 

the agency announced 

the approval of Aimovig 
(erenumab-aooe), an antibody 
therapy that blocks a protein 
called calcitonin gene- 

related peptide receptor. This 
molecule is involved in pain 
perception, and other migraine 
treatments that target it are 
following close behind. Amgen 
of Thousand Oaks, California, 
developed the drug with 
Novartis of Basel, Switzerland. 


> NATURE.COM 
For daily news updates see: 
www.nature.com/news 
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Government plan 
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Health Ministry officials carry the first batch of experimental Ebola vaccines in Kinshasa, Democratic Republic of the Congo, on 16 May 2018. 
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Experimental drugs poised 
for use in Ebola outbreak 


International health groups are in discussions with the Democratic Republic of the Congo. 


BY ERIKA CHECK HAYDEN 


id workers responding to the Ebola 
A“ outbreak in the Democratic 

Republic of the Congo (DRC) are 
seeking approval to treat patients with experi- 
mental drugs. These include three potential 
treatments — ZMapp, favipiravir and GS-5734 
— that were given to patients during the 2014- 
16 Ebola epidemic in West Africa. 

The drugs are being considered in addition 
to the use of an experimental vaccine; none of 
the treatments has been definitively proved to 
lower the risk of death from Ebola. 

The move to test experimental drugs and 
vaccines early in the outbreak, which was 
confirmed on 8 May, is part of a push to start 
research as soon as possible after Ebola cases 


are detected to save lives. That’s a change from 
the past, when doing research during an out- 
break was seen as a distraction. 

The switch has been driven by the availability 
of new vaccines and drugs — and by memories 
of the 2014-16 epidemic. Officials were so slow 
to deploy potential vaccines and drugs that the 
epidemic had waned before clinical trials could 
start. Now, “there’s an acceptance that research 
during an outbreak is something we need to do’, 
says Daniel Bausch, director of the UK Public 
Health Rapid Support Team in London. 

The DRC allowed the use of an experimen- 
tal Ebola vaccine during the country’s last 
outbreak, in May 2017, although the outbreak 
ended before the vaccine was shipped. Earlier 
this month, the government approved the 
first shipments of the vaccine and the country 


now has more than 7,500 doses on hand. The 
vaccine was given to health-care workers 
beginning on 21 May, and could be adminis- 
tered this week to patients and their contacts. 

Public-health experts hope that the experi- 
mental vaccine, called rVSV-ZEBOYV, will 
help to control the outbreak. Forty-six peo- 
ple have been infected and 26 have died, the 
World Health Organization (WHO) said on 
21 May. The virus has spread over a wide area 
and infected at least one person in a major city, 
Mbandaka — home to 1.2 million people. 

The rVSV-ZEBOV vaccine, manufactured 
by Merck, was shown to protect against Ebola 
in a trial run during the West African epi- 
demic. None of the 5,837 volunteers who took 
the vaccine in that trial became infected. 

Officials in the DRC have quashed eight 
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> previous outbreaks through conventional 
public-health measures, such as tracking down 
people with Ebola and their contacts to under- 
stand the disease’s path. But they worry about 
how far the virus has already travelled this time 
and about the possibility that it could spread 
even farther — as it did in the West African 
epidemic, which took root in 3 countries and 
claimed more than 11,000 lives. “We think the 
outbreak could become complicated, as it did 
in West Africa, so we must do everything to 
stop it? says Jean-Jacques Muyembe-Tamfum, 
director-general of the National Institute for 
Biomedical Research in Kinshasa. 

Whether that will include deploying exper- 
imental drugs is now under discussion. The 
WHO is consulting experts to consider the 
evidence for such treatments, and the medi- 
cal humanitarian organization Médecins Sans 
Frontiéres (MSF) is talking to DRC officials 
about using them, says Annick Antierens, who 
coordinates Ebola clinical trials for MSF. 

Although the rVSV-ZEBOV vaccine could 
help prevent new infections, Antierens says, 
experimental drugs might still be needed 
because officials lack a good understanding of 
where Ebola first emerged during this outbreak 
or howit is spreading. So there are likely to be 
very many people who are already infected. 

Using experimental vaccines and drugs in 
an outbreak raises logistical and ethical com- 
plexities, such as delivering them to remote 
settings by aeroplane or motorbike and design- 
ing humane and rigorous clinical trials. The 
2014-16 Ebola outbreak saw controversy over 
whether potential drugs and vaccines should 
be tested in trials that randomly assign patients 
to receive either the experimental treatment or 
standard care. MSF and officials at the WHO 
argued that withholding the medicines from 
desperate patients would be unethical. 

Some of the treatments MSF is now 
considering were given to varying numbers of 
people in the 2014-16 epidemic. ZMapp, an 
antibody treatment made by Mapp Biophar- 
maceutical in San Diego, California, was tested 
in a 72-person trial; 22% of the 36 people who 
received the drug died, compared with 37% 
of the 35 who did not receive it. Favipiravir, 
an antiviral drug from the Japanese company 
Toyama Chemical, was given to 126 patients 
in the West African outbreak, anda few dozen 
in other trials. The antiviral drug GS-5734 was 
given to three people. 

The Congolese Ministry of Health and a 
national ethics review board would need to 
approve new drug trials. Observers say that 
studies must proceed more equitably than they 
did in the 2014-16 outbreak, when experimen- 
tal treatments were given first to international 
doctors and aid workers. 

“We were pretty tone deaf?” says Lawrence 
Gostin, director of the WHO Collaborating 
Center on Public Health Law and Human 
Rights at Georgetown University in Wash- 
ington DC. “We need to do that completely 
differently this time.” = 
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Indonesia plans strict 
foreign research laws 


Regulations could hamper international collaborations. 


BY DYNA ROCHMYANINGSIH 


cientists in Indonesia fear that a govern- 

ment plan to introduce stringent rules for 

foreign researchers will scare off potential 
collaborators and hamper experiments. The 
proposals also suggest tough new penalties, 
including prison sentences, for foreign scien- 
tists who break some existing rules, such as the 
requirement to have a research permit. 

Next month, representatives from two 
science academies will meet with politicians 
in the hope of convincing them to reconsider 
the proposals. 

“The new regulations will only repel 
foreign scientists to do research in Indonesia, 
and this is not good for Indonesia's science,” 
says Berry Juliandi, a member of the Young 
Academy of Sciences and a biologist at Bogor 
Agricultural University. The contribution of 
international scientists is crucial for Indo- 
nesian research because foreign science 
agencies have larger budgets and more 
sophisticated technology, he says. 

Government documents state that the 
proposed regulations for international 
science are meant to protect Indonesia’s 
natural resources and to increase local science 
capacity. The proposals are among several out- 
lined in a draft law submitted to the House of 
Representatives in August 2017. 

If the law is approved by the house, interna- 
tional scientists will have to submit their raw 
data to the research ministry; involve Indone- 
sian colleagues as equal partners in research 
projects; and name all Indonesian researchers 
involved in a project on every peer-reviewed 
paper that arises from the work. 

The draft law also imposes harsh penalties 
on foreign researchers who break existing 
regulations. Foreign scientists will still need a 
government permit to do research, and a spe- 
cial transfer agreement to remove specimens, 
but breaking these rules would be upgraded 
to acriminal offence. Researchers could face 
a prison sentence of up to 2 years, or hefty 
fines of as much as 2 billion Indonesian 
rupiah (US$143,000). The current penalty 
for a researcher who violates a permit can 
vary from a verbal warning to the permit 
being revoked. There has been no national 
policy or penalty for scientists who remove 
specimens without an agreement. 

The draft law would also require that inter- 
national scientists do research that produces 


“beneficial output for Indonesia”. 

Erik Meijaard, a conservation scientist 
at the University of Queensland in St Lucia, 
Australia, who studies orangutans in Borneo, 
says the proposal is “unworkable” for 
foreigners: “You could do a few years research, 
find out that the outcomes do not benefit 
Indonesia, and then you cannot publish” 

Meijaard adds that, overall, the draft law 
seems vague and is “certain to turn away 
foreign researchers and stop people from 
studying in Indonesia if there is an unclear 
risk of being fined or sent to jail”. 

The ministry’s director-general for research 
and development reinforcement, Muhammad 
Dimyati, doesn't think the rules will stop col- 
laboration. “We encourage foreign scientists 
to publish their research conducted in Indo- 
nesia. But they should not write alone and they 
have to involve Indonesian scientists. This will 
certainly give benefits to Indonesia's science.” 

Dimyati says every country has a right to 
protect its natural resources for the welfare 
of its people. “The sanctions are intended to 

remind scientists 


“We encourage about their role in 
foreign society, which is 
scientists to to find innovation 
publish research  thatisbeneficial for 
conducted in mankind without 
Indonesia. But violating regula- 
they should not tions of a country,” 


write alone.” he says. 

Not all research- 
ers think the penalties are a bad idea. Laksana 
Handoko, a physicist at the Indonesian 
Institute of Sciences in Jakarta, supports 
criminal punishments for researchers who 
take specimens out of the country without a 
transfer agreement. It’s stealing, he says. 

Jason von Meding of the University of New- 
castle in Australia, who studies disaster risk- 
reduction in southeast Asia, says that scientists 
in developing countries need protection for 
their work — and that international scientists 
should not be afraid of the draft law if they've 
done nothing wrong. But he thinks that less- 
severe sanctions would be more appropriate 
than the proposed criminal penalties. 

The research ministry’s director of 
intellectual property, Sadjuga, says it could 
be some time before the proposals become 
law, because members of the house have to 
debate the draft, and they are preparing for 
an election in 2019. m 
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Geological activity around the Kilauea volcano in Hawaii includes several fissures that have opened and begun oozing lava. 


Hawaii volcano holds clues 
to other eruptions 


Scientists scramble to analyse data from steam explosion at Kilauea. 


BY SARA REARDON 


fter weeks of unleashing earthquakes 
A= lava flows that have forced thou- 

sands of people to evacuate their 
homes, Hawaii's Kilauea volcano finally blew 
its top last week. Because Kilauea is one of the 
best-monitored volcanoes in the world, scien- 
tists hope that data from the event will help 
them to better predict when similar volcanoes 
are about to erupt. 

“We'll be working on this set of data for 
our careers,” says Michael Poland, a geo- 
physicist at the US Geological Survey (USGS) 
Cascades Volcano Observatory in Vancouver, 
Washington. 

The USGS says that the eruption began 
at 4:15 a.m. local time on 17 May, when the 
volcano sent a plume of ash and steam more 
than 9,100 metres into the air. 

The many instruments on and around 
Kilauea were watching. The volcano bristles 
with equipment that continuously measures 
signs of geological activity, such as ground 
movement, lava chemistry and seismic 
vibrations. 

The first hint of an impending eruption came 
with a series of earthquakes on 3 May. Soon 
after, fissures opened up in the ground as far 
as 40 kilometres away from the volcano’s rim 
— oozing lava that forced about 2,000 people 
to evacuate. The openings also depressurized 


the network of underground channels beneath 
Kilauea, including its lava chamber. As a result, 
the lava level within the volcanos crater quickly 
dropped by more than 30 metres. It was, Poland 
says, “like someone pulled the plug in a bathtub” 

That caused the walls of the volcano to begin 
crumbling into the crater, creating a layer of 
rock atop the surface of the remaining lava. 
And once the surface of the still-draining lava 
dropped below the water table, water began to 
seep into the crater, creating steam and pres- 
sure beneath the freshly formed rock cap. 

Scientists at the USGS’s nearby Hawaiian 
Volcano Observatory suspected that a steam 
explosion was imminent: in 1924, the same 
pattern of oozing fissures around Kilauea had 
heralded a series of explosions. The researchers 
were proved right on 17 May, when the pres- 
sure that built up in the crater sent debris and 
ash flying. But Poland says that steam explo- 
sions are hard to anticipate. 

The hope this time, he adds, is that the 
extensive instrumentation on Kilauea and data 
collected from the latest eruption will allow 
scientists to develop better markers for pre- 
dicting when a steam explosion is imminent. 

For now, scientists in Hawaii and around 
the world are watching Kilauea and waiting 
to see what else the volcano will do. “This 
has so far been playing out how USGS said 
it would,’ says Janine Krippner, a volcanolo- 
gist at Concord University in Athens, West 


Virginia. “That itself is incredible” 

Kilauea probably isn’t done erupting yet, if 
the 1924 event is a guide: it went on for more 
than 2 weeks and caused more than 50 steam 
explosions. Hawaii’s state volcanologist, Bruce 
Houghton, says that the current event seems to 
bea type of complex eruption that occurs only 
once every 50 to 80 years. 

The recent activity has prompted the state of 
Hawaii to order the evacuation of several areas 
near Kilauea, protecting residents from the 
greatest eruption danger — flying rocks. But ash 
could still present a problem for people with res- 
piratory problems, and officials are monitoring 
the wind and weather patterns to predict where 
it will go. Other hazards include plumes of 
sulfur dioxide and a toxic haze that forms when 
lava reaches the ocean, as it did on 19 May. 

Houghton says that it may be hard to know 
when it is safe for people to return to evacu- 
ated areas. In the 1960s, an eruption at Kilauea 
stopped for a year and a half before the volcano 
resumed spewing ash. “It takes quite a long 
time to be certain things are completely over,” 
he says. “The past record suggests it might go 
for months or years.” 

Kilauea’s latest eruption “is an exciting event 
but it’s coming at a cost’, Poland says, noting 
that homes have been destroyed and tourism is 
suffering. “We as the scientific community feel 
we owe it to the people being impacted to get it 
right and learn as much as we can,’ he says. = 
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The Queqiao spacecraft and two radio-astronomy experiments launched from the Xichang Satellite Launch Centre in western China on 21 May. 


China shoots for Moon’s far side 


Quegiao probe carries technologies that could one day explore the Universe’s dark ages. 


BY DAVIDE CASTELVECCHI 


hina has taken its first major step in 
( a groundbreaking lunar mission. On 

21 May, a probe launched from Xichang 
Satellite Launch Centre to head beyond the 
Moon, where it will lie ready to act as a commu- 
nications station for the Change-4 lunar lander. 
The nation hopes that the lander will, later this 
year, become the first craft to touch down on the 
far side of the Moon. 

The relay probe, named Queqiao and 
designed by the Chinese Academy of Sciences, 
also carries two pioneering radio-astronomy 
experiments. Both are proof-of-principle mis- 
sions designed to test technologies for exploring 
a period in cosmic history known as the dark 
ages. These first few hundred million years of 
the Universe’s existence, before galaxies and 
stars began to form, are all but impossible to 
study from Earth. But the spectrum of radiation 
from this age — when matter was distributed 
nearly uniformly across space as a thin, cold 
haze — could reveal information about the 
distribution of ordinary matter compared with 
dark matter in the Universe. 
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The first of the two experiments is the 
Netherlands—China Low-Frequency Explorer 
(NCLE). It will remain attached to Queqiao, which 
will linger around “Earth- Moon L2’ — a gravi- 
tational resting point about 60,000 kilometres 
beyond the Moon 
that tracks the Moon's 
orbit around Earth. 

The Dutch-built 

NCLE experiment 

will try to exploit the 

relative quiet there to 

measure radio waves 

with frequencies between about 1 megahertz 
and 80 megahertz, coming from the Solar 
System, the Galaxy and beyond. 

Much of this frequency band is blocked by 
Earth’s atmosphere, but cosmologists expect 
it to contain information from the dark ages. 
Around the upper end of this band also fall the 
‘cosmic-dawn signals from the first stars, which 
lit up around 200 million years after the Big 
Bang, and were apparently detected for the first 
time in Australia earlier this year. Other experi- 
ments are trying to replicate those results — but 
the NCLE is testing technologies for identifying 


lower-frequency signatures from the dark ages. 

For at least part of its orbit, Queqiao will be 
eclipsed by the Moon, as seen from Earth, which 
could benefit the NCLE because its antennas 
will be further shielded from the radio noise that 
constantly leaks from our planet. Still, observa- 
tion time and the bandwidth for sending data 
back to Earth will be limited. And because 
Quegqiao is designed primarily as a data-relay 
station (its name comes from a folktale about 
magpies that form a bridge across the sky), it 
is not optimized for radio astronomy. That 
means it will be challenging, if not impossible, 
for this demonstrator mission to detect the 
dark-ages signal, says Heino Falcke, a radio 
astronomer at Radboud University Nijmegen in 
the Netherlands who is the experiment’s science 
leader. Nonetheless, the NCLE “is pioneering 
and an important first step toward investigating 
the dark ages and cosmic dawn’, says Jack Burns, 
an astrophysicist at the University of Colorado 
Boulder who is leading a proposal for a NASA 
mission with similar objectives. 

To avoid jeopardizing the Queqiao probe, 
mission control will deploy the NCLE’s 
antennas only after the Change-4 lander’s 
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mission is completed, says Marc Klein Wolt, 
a Radboud astronomer who is NCLE’s man- 
ager. But the NCLE might go on collecting 
data for several years, he says. 


SATELLITE BREAK-OFF 

The second experiment that launched 
with Quegiao consists of two smaller satel- 
lites called Longjiang-1 and Longjiang-2, 
which will detach from the mothership and 
orbit the Moon. Built by researchers at the 
Harbin Institute of Technology in China, 
the instruments will test technology for a 
radio-astronomy technique called very- 
long baseline interferometry (VLBI). This 
approach combines data from multiple radio 
antennas to create images of much higher 
resolution than would be possible with a 
single dish. 

Falcke and others have long studied the 
possibility of doing VLBI with a large array 
of lunar orbiters — or on the lunar surface 
— to map variations across the sky in signals 
from the dark ages and cosmic dawn. Klein 
Wolt says that his team might experiment 
with combining data from NCLE with those 
from the two lunar orbiters, and even froma 
radio antenna on the Change-4 lander itself. 

The Change-4 mission is another step in 
China's ambitious lunar-exploration pro- 
gramme, which aims to establish a Moon 
base in the next decade, and to begin human 
exploration in the 2030s. The lunar lander 
will carry a rover and was originally designed 
as a back-up for Change-3, which in 2013 
became the first craft since 1976 to soft- 
land (rather than crash-land) on the Moon. 
Chang‘e-4 has now been repurposed, and 
the mission's main scientific goal is to study 
the geology of the hidden side of the Moon, 
which is pockmarked with many more small 
craters than the familiar near side. 

The lander carries several experiments, 
including a sealed ecosystem, built by 
Chongqing University, which will test 
whether potato and thale-cress (Arabidopsis) 
seeds sprout and photosynthesize as silk- 
worm eggs hatch and the worms produce 
carbon dioxide. Another experiment will 
measure the radiation that will confront 
future astronauts who visit the lunar sur- 
face. The rover, which will separate from 
the lander to move around the surface of the 
Moon, will carry instruments such asa solar- 
wind detector built by a Swedish team. m= 
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towards an open-access model are gain- 

ing steam. Negotiators from libraries 
and university consortia across Europe are 
sharing tactics on how to broker new kinds of 
contracts that could see more articles appear 
outside paywalls. And inspired by the results 
of a stand-off in Germany, they increasingly 
declare that if they don’t like what publishers 
offer, they will refuse to pay for journal access 
at all. On 16 May, a Swedish consortium 
became the latest to say that it wouldn't renew 
its contract, with publishing giant Elsevier. 

Under the new contracts, termed ‘read 
and publish’ deals, libraries still pay subscrip- 
tions for access to paywalled articles, but their 
researchers can also publish under open-access 
terms so that anyone can read their work for 
free. Advocates say such agreements could 
accelerate the progress of the open-access 
movement. Despite decades of campaigning for 
papers to be published openly — on the grounds 
that the fruits of publicly funded research 
should be available for all to read — scholarly 
publishing’s dominant business model remains 
to publish articles behind paywalls and collect 
subscriptions from libraries (see ‘Growth of 
open access’). But if many large library consortia 
strike read-and-publish deals, the proportion of 
open-access articles could surge. 

“There is a serious ground for change across 
Europe,’ says Koen Becking, chief negotiator 
for the VSNU, a consortium of 14 institutes in 
the Netherlands. In 2014, the VSNU was the 
first national group to negotiate a subscrip- 
tion deal that included rights for its scholars 
to publish all of their work openly. It has since 
agreed several more that include varying levels 
of open publishing. Consortia in Austria, the 
United Kingdom, Sweden and Finland have 
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Open-access drive 
spreads in Europe 


Negotiators share tactics to broker new deals with publishers. 


struck similar deals, and Switzerland will start 
to negotiate its first open-access contracts this 
year. A survey by the Brussels-based European 
University Association, published in April, 
reported that, last year, 11% of negotiating 
consortia in Europe made deals that took into 
account open-access publishing costs, but 63% 
planned to do so in the future. 

On 2 May, negotiators from countries 
across Europe agreed to align their bargaining 
strategies at a closed meeting in Berlin attended 
by the European Commission's special envoy 
for open access, Robert-Jan Smits. According 
to Gerard Meijer, one of the German negotia- 
tors present, consortia are “frustrated” by the 
lack of progress in talks and feel the limits of 
partnerships between institutions and large 
publishers “have been reached. It is up to us 
now to act, and to step out of these negotiations 
if these are going nowhere,’ he says. 

The meeting was the latest in a string of 
events in which negotiators from different 
countries swapped tactics. “More and more 
people are willing to share their experiences,” 
says Matthijs van Otegem, director of the 
library at Erasmus University in Rotterdam, 
and chair of the open-access working group at 
the Association of European Research Librar- 
ies (LIBER) in The Hague, the Netherlands. 

In September last year, LIBER published a 
list of principles to guide negotiators seeking 
to change their deals. These include ending 
non-disclosure agreements that publishers 
customarily place on contracts (which would 
enable negotiators to compare deals in differ- 
ent countries) and not agreeing to price hikes 
without open-access agreements in place. 

Akey driver behind the activity in Europe is 
the European Commission's goal that, by 2020, 
all research will be freely accessible as soon as 
it is published. Dutch negotiators have been 
tasked with brokering a deal that meets > 
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> this vision. In Sweden, negotiators have 
set themselves the target of complete open- 
access research by 2026, and in Switzerland the 
planned date is 2024. But word is spreading 
outside Europe, too: last month, negotiators in 
Germany travelled to South Korea to discuss 
their work with consortia there, and represent- 
atives from the University of California system 
attended the Berlin meeting. 


DEAL OR NO DEAL 
The situation in Germany has shown that ‘no 
deal’ is an option, van Otegem says. Since 2016, 
a university consortium there has held out ona 
new deal with Elsevier. Despite the stand-off, 
the publisher has not stopped German schol- 
ars accessing its journals — suggesting that 
universities need not fear researchers’ wrath if 
negotiations stall. Since then, other consortia 
have also announced ‘no deals’ with publishers. 
One reason that libraries no longer fear 
an end to their contracts is that a growing 
number of free versions of paywalled articles 
can be found online as preprints or accepted 
manuscripts, notes Heather Joseph, execu- 
tive director of the Scholarly Publishing and 
Academic Resources Coalition (SPARC), an 
advocacy group in Washington DC. Sci-Hub, 
a website that illicitly hosts copies of papers 
and is used by academics around the world, 
is also a big factor, says Joseph Esposito, a 


publishing consultant in New York City. 
“Without Sci-Hub the researchers would be 
screaming at the libraries and state agencies 
not to cut them off” he says. 

Costs are a major sticking point in the stand- 
offs. A spokesperson for the Royal Society of 


GROWTH OF OPEN ACCESS 


In 2016, journals made 18.9% of papers 
open immediately on publication, up from 
11.5% in 2012. 
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Chemistry’s publishing arm, for instance, 
says that its current dispute with the VSNU 
revolves partly around the difficulty of reach- 
ing “an agreement which makes the transition 
to open access sustainable”. (Other publishers 
approached by Nature’s news team declined to 
comment on details of specific negotiations.) 
Consortia are generally unwilling to discuss 


whether the read-and-publish contracts work 
out to be more expensive, but some say they 
don’t want to agree to contracts that require 
above-inflation price rises. And in the United 
States, where open access has less political 
impetus than in Europe, libraries are trying 
to save money by cancelling ‘big deal’ con- 
tracts — comprehensive, but expensive, deals 
for access to large bundles of journals — in 
favour ofa la carte access to the journals their 
academics use the most. This has happened 
before, but is now a more common approach, 
according to a list collated by SPARC. 

Steven Inchcoombe, Springer Nature's chief 
publishing officer, says that some deals that 
combine reading and publishing costs have 
been brokered in northern Europe because of 
strong support for open access from research 
funders, institutions and governments. But 
unless more money is available to pay for 
such deals, they are unlikely to become more 
popular in the future, he says. “It is in everyone's 
interest to solve this,” he adds. (Nature’s news 
team is editorially independent of its publisher.) 

If the stand-offs continue, the VSNU’s 
Becking thinks that negotiators might end 
up striking deals, but might also simply stop 
bargaining with particular publishers. In that 
case, universities could encourage researchers 
to disseminate their work on alternative 
platforms, he says. m 
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GUT MICROBES JOIN THE 
FIGHT AGAINST CANCER 


The intestinal microbiome seems to influence how well some cancer 
drugs work. But is the science ripe for clinical trials? 


BY GIORGIA GUGLIELMI 


ertrand Routy earned a lamentable 

reputation with Parisian oncologists 

in 2015. A doctoral student at the 

nearby Gustave Roussy cancer cen- 

tre, Routy had to go from hospital to hospital 

collecting stool samples from people who had 

undergone cancer treatments. The doctors 

were merciless. “They made fun of me,’ Routy 
says. “My nickname was Mr Caca.” 

But the taunting stopped after Routy and 
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his colleagues published evidence that certain 
gut bacteria seem to boost people's response to 
treatment’. Now, those physicians are eager to 
analyse faecal samples from their patients in 
the hope of predicting who is likely to respond 
to anticancer drugs. “It was an eye-opener for 
a lot of people who couldn't see the clinical 
relevance of gut microbes,” says Routy, who 
is now at the University of Montreal Health 
Centre in Canada. 
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Cancer has been a late bloomer in the microbiome 
revolution that has surged through biomedicine. Over the 
past few decades, scientists have linked the gut’s composi- 
tion of microbes to dozens of seemingly unrelated conditions 
— from depression to obesity. Cancer has some provocative 
connections as well: inflammation is a contributing factor 
to some tumours and a few types of cancer have infectious 
origins. But with the explosive growth of a new class of drug 
— cancer immunotherapies — scientists have been taking a 
closer look at how the gut microbiome might interact with 
treatment and how these interactions might be harnessed. 

After preliminary findings in mice and humans revealed 
that gut bacteria can sway responses to such drugs, scien- 
tists started trying to decipher the mechanisms involved. 
And researchers are launching a handful of clinical trials 
that will test whether the gut microbiome can be manipu- 
lated to improve outcomes. 

Some proponents say that strategies to mould the 
microbiome could be game-changing in cancer treatment. 
“It's a smart place to be,” says Jennifer Wargo, a surgeon-sci- 
entist at MD Anderson Cancer Center in Houston, Texas. 
But others are worried that the move to the clinic is prema- 
ture. William Hanage, an epidemiologist at the Harvard T. 
H. Chan School of Public Health in Boston, Massachusetts, 
calls the idea “phenomenally interesting’, but adds: “I have 
some anxiety about the notion that only beneficial effects 
are possible” 


INTRIGUING LINK 

Although the excitement about microbes and 
immunotherapy has emerged only in the past three years, 
some researchers have been exploring connections between 
gut bacteria and cancer for much longer. Scientists first 
linked the infectious bacterium Helicobacter pylori to gas- 
tric cancer back in the 1990s, for example. And since then, 
other bacteria have been associated with cancer initiation 
and progression. Some of these microbes activate inflam- 
matory responses and disrupt the mucus layers that protect 
the body from outside invaders, creating an environment 
that supports tumour growth. In other cases, they promote 
cancer survival by making cells resistant to anticancer drugs. 

But gut bacteria can also help fight tumours. In 2013, 
a group led by Laurence Zitvogel” at Gustave Roussy and 
one led by immunologists Romina Goldszmid and Giorgio 
Trinchieri’ at the National Cancer Institute in Bethesda, 
Maryland, showed that some cancer treatments rely on the 
gut microbiome activating the immune system. 

Zitvogel’s team found that the chemotherapy drug cyclo- 
phosphamide damages the mucus layer that lines the intes- 
tine, allowing some gut bacteria to travel into the lymph 
nodes and spleen, where they activate specific immune 
cells. For mice raised without microbes in their guts or 
given antibiotics, the drug largely lost its anticancer effects. 

Following this observation, Zitvogel decided to explore 
whether bacteria in the gut might influence responses to a 
class of immunotherapy drugs called checkpoint inhibitors. 
These drugs, typically antibodies to cell-surface molecules 
such as CTLA4 and PD1, unleash a person’s immune sys- 
tem against tumour cells, and are used to treat several types 
of cancer (see ‘A little help from their friends’). But only 
20-40% of people respond to treatment’. 

In 2015, Zitvogel and her team showed that microbe-free 
mice failed to respond to one such drug, and mice given a 
particular bacterium, Bacteroides fragilis, responded better 
than did mice without it’. 

The idea started to spread. Thomas Gajewski, a cancer 
clinician at the University of Chicago in Illinois, reported 


that Bifidobacterium microbes increased the response to 
cancer immunotherapy in mice®. These gut-dwelling bac- 
teria acted by boosting the ability of some immune cells to 
initiate a response against tumours. 

Wargo saw these results presented at a meeting in 2014, 
and on returning to Texas, immediately started to collect 
stool samples from people with skin cancer who were about 
to undergo immunotherapy at her institution. Last Novem- 
ber, Wargo’, Gajewski® and Zitvogel' all published results in 
Science linking positive immunotherapy responses in peo- 
ple to specific varieties of gut bacteria. The samples that 
Routy had collected in Paris helped Zitvogel’s team to also 
show that people who had taken antibiotics for unrelated 
infections tended to respond poorly to immunotherapy. 

To solidify the relationships, the researchers transferred 
bacteria from the human participants into the intestines 
of mice with comparable cancers. Rodents who got ‘bene- 
ficial’ bacteria developed smaller tumours than did mice 
that received microbes from people who hadn't responded 
to treatment. “All of this work has been very exciting,” says 
Neeraj Surana, a microbiologist at Boston Children’s Hos- 
pital. “They've opened up the possibility for a clear thera- 
peutic application of microbiome science.” 


HEADING TO THE CLINIC 
Researchers are now running with that possibility. Hassane 
Zarour, an immunologist at the University of Pittsburgh in 
Pennsylvania, partnered with the global pharmaceutical 
company Merck to collect faecal bacteria from people who 
respond to treatment with a checkpoint inhibitor and trans- 
fer them into the intestine of non-responders, a process 
called faecal microbiome transplant. Merck has invested 
about US$900,000 into this trial, which is set to start in the 
next few weeks. 

Wargo is planning a similar trial. Together with the 
Parker Institute for Cancer Immunotherapy in San 
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The cancer immunotherapies anti-CTLA4 and anti-PD1 remove 
certain natural barriers to immune activity. But they don’t work in 
every patient. Gut bacteria might provide signals to immune cells 
that help to supercharge their tumour-fighting efforts. 
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Anti-CTLA4 allows tumour-fighting T cells to 
multiply. A surface molecule from the bacterium 
B. fragilis could further boost the number of 
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Anti-PD1 blocks a molecule that shields tumour 
cells from attack. Several bifidobacteria species 
seem to stimulate dendritic cells in the tumour. 


That boosts the number of cancer-killing 


cytotoxic T lymphocytes (CTLs) and ensures that 
there are enough to fight the uncloaked tumour. 
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Francisco, California, and the biotech company Seres 
Therapeutics in Cambridge, Massachusetts, she expects to 
test whether faecal transplants can reshape the gut micro- 
biome of non-responders in a beneficial way. 

These microbiome transplants are becoming a 
mainstream treatment for some non-cancer illnesses. In 
February, for example, the Infectious Diseases Society of 
America recommended that physicians use these proce- 
dures to treat people with bowel infections caused by the 
bacterium Clostridium difficile who had failed to respond to 
other treatments. But the approach has downsides. To avoid 
the risk of inadvertently infecting people with pathogenic 
microbes, researchers must be careful in how they select 
donors and screen faecal material before transferring it 
to recipients. That's why, in addition to faecal transplants, 
Seres Therapeutics, the Parker Institute and Wargo will test 
a pill containing a set of spore-forming bacteria that have 
been purified from the faeces of responding patients. 

Gajewski and his partners at Evelo Biosciences, a biotech 
company in Cambridge, are using a similar approach. Their 
trial will assess the effects of two pills containing single 
bacterial strains in people with different types of cancer, 
including colon and skin cancer. 

Zitvogel is not planning to start clinical trials but she 
has co-founded the Delaware-based start-up EverImmune, 
which is developing a microbiome-based pill. 

It’s still unclear exactly how microbes might interact with 
immunotherapeutics. A widely accepted hypothesis is that 
some boost the body’s response against tumours by regulat- 
ing how easy it is to activate the immune system. But the 
precise mechanisms, including which bacteria modulate 
which immune cells, remain a mystery. 

The researchers hope that the clinical trials will help to 
clarify things. Wargo, for instance, is exploring bacterial 
metabolites. Her team hopes to find specific metabolic 
signatures of a good outcome in the stools and blood of 
people who respond to therapy, as well as to document the 
numbers of immune cells in the blood and tumours of trial 
participants. 

Gajewski suggests that microbes might be unleashing the 
immune response by stimulating the gut cells to produce 
certain molecules. His team is testing whether circulating 
immune-cell precursors change their behaviour when spe- 
cific bacteria are given to mice. At the same time, the group 
is trying to pin down which species might be driving the 
positive outcomes. 


TOO EARLY, OR JUST RIGHT? 

Given the uncertainties, some scientists argue that testing 
these approaches in humans is risky. Some trial participants 
could experience side effects, Surana says. And changing 
the make-up of an individual’s microbiome might predis- 
pose them to other health problems. 

Faecal transplants come with a lot of unknowns. They 
have proved safe and effective in many people without can- 
cer, Wargo says, but they have also been associated with 
unexpected effects, including one case in which the proce- 
dure led to weight gain and obesity’. “Should we look for 
safety signals on these trials? Absolutely.” Wargo says, “But I 
strongly feel that we need to go into these trials. We need to 
design them well. We need to really learn from these trials.” 

Gajewski, who plans to test the effects of just one bifi- 
dobacterial strain at a time, says there’s good reason to be 
confident. “People have eaten bifidobacteria for a thou- 
sand years,’ he says. The bacteria are present in the gut of 
infants and decline in number as the people grow up, so 
they should at least be safe, he adds. 
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But it’s unclear whether a single species can help people 
with cancer and, if so, what bacterium that is. The papers 
published in Science last year all associated different bacteria 
with the best outcomes, even for the same cancer and therapy. 

The researchers looked at people with cancer from 
France and the United States, so diet could account for 
some of the differences, Wargo says. But variations in 
sample collection, data analysis and statistical meth- 
ods could also have skewed the results, says Joél Doré, a 
biologist at the French National Institute for Agricultural 
Research in Paris who in 2011 helped to launch the Inter- 
national Human Microbiome Standards (IHMS) project 
with the aim of improving data reproducibility in micro- 
biome research. 

Hanage says that even the two studies’* that analysed 
people in the United States with the same type of cancer 
identified only a partially overlapping set of microbes asso- 
ciated with positive outcomes. If researchers don't work out 
the reason for these differences, they might not be able to 
interpret the outcomes of the trials, Hanage says. 

Before starting clinical trials, the three groups should try 
to reproduce each other’s results and converge on a set of 
‘beneficial microorganisms, Hanage argues. “Any of these 
bacteria could be a useful approach.” But inconsistencies 
might mean that the results are not reproducible. 

It’s a concern common to microbiome research. “A lot of 
findings have proven to either not stand up or be consider- 
ably more complicated than they first appeared,’ Hanage 
says. Standards such as those developed by the IHMS pro- 
ject should help, but scientists will be reluctant to take them 
on board, says Susan Erdman, a microbiologist and cancer 
biologist at the Massachusetts Institute of Technology in 
Cambridge. Doing so would come at the cost of innovation, 
she argues — it’s by experimenting in different settings that 
researchers make discoveries. 

Wargo says that the community should standardize its 
approaches for collecting samples and doing analyses, as 
well as for validating studies in larger groups of patients. 
Since last year, her group has analysed stools from more 
than 500 people with skin cancer who had received dif- 
ferent therapies. In parallel with the Paris-based team led 
by Zitvogel, the researchers are analysing patients treated 
with two combined immunotherapies to work out which 
gut bacteria mediate a response to that combination. Wargo 
hopes that the gut microbiome could eventually help to 
identify which patients will respond to which anticancer 
treatments. “Can we use it as a biomarker? It’s a provocative 
question,” she says. 

In the short term, there will be a whole lot more sam- 
ple collection. And this time around, it’s likely that fewer 
oncologists will raise an eyebrow, says Routy, who is now 
investigating how the gut microbiome boosts immunother- 
apy with his own group. In cancer therapy, “gut microbes 
have gone from ignored to super-popular organisms’, he 
says. Now, they'll just have to live up to their reputation. m 


Giorgia Guglielmi is an intern with Nature in 
Washington DC. 
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Women in Odisha, India, where in 2014 the Dongria Kondh forest tribe won a lawsuit to stop a bauxite mine from opening. 


The 


global south is rich in 


sustainability lessons 


Educators must share how communities in the developing world manage 
environmental change — a Western bias limits progress, argues Harini Nagendra. 


tends six plant pots balanced on a wall. 
They contain shoots of holy basil (or 
tulasi, Ocimum tenuiflorum). I asked her 
why she does this, in a cramped space with 
an unreliable water supply. She told me that 
the plants replace her tiny roadside kitchen 
garden, which she lost when the street was 
widened. The wind blew the basil seeds into 
the pots. “How can one turn away a guest, 
even if they come uninvited?” she said. 
Dhanalakshmi’s deep, personal 


I: a Bangalore slum, Dhanalakshmi 


connection to nature shapes her actions, 
even though she lives far from the 
countryside. Such attachments are shared 
by many people around the world. They 
run through centuries of Indian thinking 
on sustainability: nature offers material 
benefits; it is part of people’s cultural 
identities and often viewed as sacred. Pro- 
tecting nature also confers social merit. A 
stone inscription from AD 1340 describes 
the motivation of Chenneya Nayaka, 
the ruler of a region near Bangalore, for 


building an irrigation tank: ‘to support 
animals, cattle, birds, and all other living 
beings, and the service at all times of (the 
goddess of water) Ganga Devi”. 

In the early twentieth century, Mahatma 
Gandhi fought poverty and injustice through 
peaceful civil resistance. He championed 
local production, education, health care 
and self-sufficiency. Inspired by Gandhi's 
ideas, members of the Chipko and Appiko 
environmental movements hugged trees 
in the 1970s and 1980s to prevent them 
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from being felled”. In 2014, after 12 years 
of campaigning, the Dongria Kondh forest 
tribe in India’s Odisha state won a lawsuit 
to stop a bauxite mine from opening and 
ruining the hillsides that they revered and 
depended on for food. 

These strongly rooted local movements 
have brought sustainability issues into 
everyday conversations in India. They have 
inspired generations of activists. Yet most 
university courses on sustainability omit 
them. Teachings still have a Western focus, 
even in India. Most books on sustainability 
frame the discourse in terms of Earth's finite 
resources and rising population. 

The limited Western view of sustainabil- 
ity is stifling progress, just as the world faces 
crises over water, 
climate change, 
energy and biodi- 
versity. That view 
also does a disser- 
vice to the variety 
and creativity 
of thinking and 
actions on sustain- 
ability in societies across the globe. Develop- 
ing countries face the most acute challenges 
in this regard, yet they have the widest gaps 
in knowledge. Solutions that work in one 
place might fail in another. Excessive con- 
sumption, inequity and social injustice are 
not questioned enough. 

At Azim Premji University in Bangalore, 
my colleagues and I see sustainability dif- 
ferently. We have moved away from framing 
it exclusively around limits to growth and 
conserving natural resources. Instead, we 
emphasize the connections between com- 
munities, ecosystems and social justice. In 
an online course, for instance, we discuss 
the ‘3 Fs’ — finitude (or limits), fragility 
and fairness (see go.nature.com/2t3rfdd). 
As well as university students, from under- 
graduate to postgraduate level, we teach 
bureaucrats, educators, corporate executives 
and practitioners through online courses 
and in-class curricula. 

Sustainability education must be more 
globally inclusive. Only then can the dis- 
cipline deliver the transformative change 
the world needs, rather than tinkering with 
business-as-usual. 


LOCAL DIFFERENCES 

Sustainability is usually discussed as if ‘one 
size fits all: Calls to action target the individ- 
ual: plant a tree, ride a bike, compost your 
food scraps (see go.nature.com/2rwsupi). 
Or they focus on markets and corpora- 
tions: invest in renewable energy and green 
buildings. On these terms, local contexts are 
irrelevant and materials matter more than 
people — buying an electric car is ‘green’, 
even if the cobalt for its battery might have 
come from a small-scale artisanal mine with 
horrific labour conditions. 
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Most of the sustainability workshops P’ve 
attended in the past three years in India 
focused on reducing resource use. We 
discussed emissions cuts, recycling and 
the circular economy while sitting in air- 
conditioned rooms in luxury hotels, sipping 
bottled water. Poverty, environmental jus- 
tice and governance were not mentioned. 

Most ‘classic’ writings on sustainability 
present people as the problem, not as a col- 
lective source of strength. These include 
Garrett Hardin’s 1968 essay The Tragedy of 
the Commons, Paul Ehrlich’s 1968 book The 
Population Bomb (Sierra Club/Ballantine 
Books) and The Limits to Growth, a 1972 
report by Donella Meadows and colleagues. 
For example, The Population Bomb opens 
by describing the streets of Delhi in 1966 
as “alive with people” — eating, washing, 
sleeping, visiting, arguing, screaming, 
urinating, clinging to buses and herding 
animals. An obsessive focus on overpopu- 
lation has led to millions of forced steriliza- 
tions worldwide’. 


Much is left out from these accounts. 
Political economist Elinor Ostrom, in her 
influential work on the commons", dem- 
onstrated the powerful capacity of people 
when they are organized in collectives. 
Discussions of increasing consumption are 
largely absent from classic writings on sus- 
tainability, despite the cases for sufficiency 
made by twentieth-century Indian think- 
ers. German economist Ernst Friedrich 
Schumacher, whose book Small Is Beautiful 
(Blond & Briggs, 1973) championed local, 
sustainable technologies that empower peo- 
ple, was inspired by Gandhi and the Indian 
economist J. C. Kumarappa. 

In our courses, we shine a light on these 
issues. And we show our students how 
iconic local movements in the global south 
have been just as influential in their regions 
as US environmentalists such as John Muir, 
Aldo Leopold and Rachel Carson were in 
theirs. In Latin America, for example, the 
concept of buen vivir (living well) has wide- 
spread resonance. It espouses harmony with 
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nature, prioritizing community well-being 
and respecting plurality of thought®. From 
Nepal to Mexico, indigenous communities 
have managed forests and prevented poach- 
ing in locally run reserves. Those fenced off 
by governments have fared much worse. 
We dissect lots of case studies in our 
courses and workshops. One clear lesson is 
that transplanted solutions often backfire. 
For example, a ban on livestock grazing 
degraded the ecosystem of the Keoladeo 
National Park — a region of wetlands 
in Rajasthan that is rich in bird life. The 
Indian government introduced the policy, 
following US practices, to protect suppos- 
edly pristine landscapes from trampling. 
Cattle were duly ejected from Keoladeo in 
1982. But the diversity of birds and other 
wildlife plummeted — the canals became 
clogged with weeds and grasses, which 
were previously eaten by the cattle®. 
Community relations are important. 
In 1974 in southern India, the indigenous 
Soliga tribe was banned by the government 


from setting controlled fires in the forests 
of the Biligiri Rangaswamy Temple Tiger 
Reserve. The community warned policy- 
makers of the consequences, but they did 
not listen. Without control through burn- 
ing, the invasive shrub Lantana camara 
ran riot, choking vegetation and reducing 
fodder and food’. A well-meaning policy, 
influenced by ecological ideas about succes- 
sion, has ended up damaging the forest as 
well as relationships between the tribe and 
the forest department. 

Technologies, too, can bring more harm 
than good. In the push for solar energy, 
land has been acquired forcibly from poor 
farmers. In Nepal and India, replacing tra- 
ditional mud-lined irrigation tanks and 
channels managed by farmers with cen- 
trally managed cement-lined canals has 
increased maintenance costs and damaged 
social capital. The canals silt up, and farm- 
ers no longer meet and work together to 
repair them. 

In all these cases, a science-focused 


Protesters in 
Bhopal march 
against proposed 
dams on India’s 
Narmada River. 


administration ignored 
the knowledge of local 
communities. Officials 
failed to appreciate 
the fragile social and 
ecological interconnec- 
tions in these densely inhabited, biodiverse 
landscapes. 

Our students tell us that some of these 
findings come as a surprise. The fragil- 
ity of communities must be considered 
alongside ecosystems. No technology, no 
matter how good, is a magic silver bullet. 
Narratives such as Dhanalakshmi’s can 
contradict widely held assumptions — for 
instance, that strong ties to nature exist only 
in pre-industrialized societies. 

Some students have reversed their 
opinions. For instance, those who initially 
supported diverting water from full rivers 
into others that ran dry now understand 
the importance of maintaining river- 
basin flows for ecosystems and communi- 
ties. Practitioners have been sufficiently 
inspired by these examples of ecosystem 
management through fire and grazing to 
begin altering their approaches to working 
with communities. 


GRASS-ROOTS CAMPAIGNS 

There are hundreds of environmental 
movements across Asia, Africa and Latin 
America. Although they are diverse, they 
have features in common. Such movements 
emphasize environmental justice and tend 
to emanate from local cultures. They are 
often led by women’s collectives, and use 
non-violent means of protest. They ques- 
tion the industrial route to development and 
champion collective action. Social justice 
and the rights of nature are given the same 
prominence as limited resources. Multi- 
generational thinking often features. 

For example, in 2008, catalysed by a 
coalition of indigenous groups, Ecuador 
became the first nation to incorporate rights 
of nature in its constitution’. Bolivia followed 
suit in 2009. These rights include protection, 
restoration and respect for existence. 

Grass-roots campaigns can be powerful 
in questioning unsustainable paradigms 
and changing minds, even when they 
don't prevail. For 33 years, the Narmada 
Bachao Andolan mass social movement 
has marched and brought court cases to 
stall dam construction on India’s Narmada 
River, which runs from Madhya Pradesh to 
the Arabian Sea. It did not stop the dams, but 
it has raised awareness of the consequences 
for people and places of big, top-down 
developmental projects. 

Local groups tackle a wide range of issues. 
In Indian cities, groups such as Hasiru 
Dala work with poor rag pickers to collect 
and recycle waste. In Bangalore, residents 
are liaising with municipal authorities, 
cattle grazers and fishers to restore the 
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Kaikondrahalli and Jakkur lakes. Across 
rural India, community groups collaborate 
to protect local forests, create sustainable jobs 
and provide incomes while protecting bio- 
diversity. Agro-ecological initiatives such as 
the Deccan Development Society in Medak 
district, the Foundation for Ecological Secu- 
rity in Anand and the Timbaktu Collective 
in Anantapur district work with farmers to 
restore forests and common land, and to 
promote organic methods of farming and 
soil-friendly crops such as pulses and millets. 

To encourage networking and dialogues, 
the Vikalp Sangam or Alternatives Conflu- 
ence, a non-profit collaborative discussion 
forum and website, documents grass-roots 
experiences (www.vikalpsangam.org). 
Governments can help to scale up such initi- 
atives. For instance, a programme of the state 
government of Kerala called Kudumbashree 
works with women’s groups on empower- 
ment, livelihoods and sustainability. 

Governments and grass-roots initiatives 
cannot solve all sustainability issues in iso- 
lation, especially in a country such as India 
that is accelerating towards an industrial- 
ized and urban future. We just have to look 
outside our classroom windows to see the 
negative impacts of India’s relentless growth. 

But sustainability and conservation are 
dismal disciplines. The next generation 
needs cases of hope to counter narratives 
of gloom and doom. And they need to 
know that successes can be found on their 
doorstep, not just in the West. 


GLOBAL LESSONS 

Sustainability curricula must be rethought. 
It is important to learn about, teach and 
communicate ways to reduce resource 
consumption. It is even more crucial — and 


488 | NATURE | VOL 557 | 24 MAY 2018 


Indigenous people protesting in Brasilia over the government’s failure to safeguard their land. 


much harder — to transform world views 
and dismantle unsustainable paradigms of 
development and growth. 

Sustainability needs to be defined as 
encompassing natural resource conserva- 
tion as well as social justice and collective 
action. Such world views must go beyond 
purely utilitarian concepts of nature. Edu- 
cators should tailor their lessons to be more 
globally inclu- 
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ity, should develop 

more diverse educational and outreach 
materials. Academics, indigenous and local 
communities, practitioners and environ- 
mental activists must all be involved. 

To expand the range of cases examined, 
researchers must document contempo- 
rary grass-roots attempts to engage with 
sustainability in the global south, such as 
those discussed here. A good start is the 
Seeds of a Good Anthropocene project run 
by Future Earth. This shares case studies 
and tools to support positive visions of 
futures that are socially and ecologically 
desirable, just and sustainable. More exam- 
ples should be drawn from Africa, Latin 
America and Asia. Researchers from the 
global south should lead study projects, 
and funding agencies should provide 
financial support at the scale needed to 
maintain such leadership. 


Key topics to explore include how 
communities reshape traditional 
approaches to grapple with twenty-first- 
century challenges, how they address 
gender and caste inequities, and how 
philosophies and faiths influence people’s 
attitudes to nature. For example, Kudum- 
bashree, the Foundation for Ecological 
Security and the Deccan Development 
Society are enabling women farmers, fish- 
ers and grazers to take the lead in public 
decision-making. The Dongria Kondh 
tribe’s belief that each component of the 
landscape has sacred significance shaped 
its rejection of commercial mining’. 

Sustainability curricula cannot rest on 
just-so stories. A set of universal principles 
needs to be derived, while respecting local 
contexts. Ostrom’s framework" for govern- 
ing shared resources is a good basis. Local 
crafting of rules, limiting free riders through 
monitoring, and strong local leadership are 
used in such disparate cases as community 
forests in Nepal, Subak irrigation systems in 
Bali’s rice fields, alpine grazing commons 
in Switzerland and Satoyama agricultural 
landscapes in Japan. 

Science and technology can only go so 
far. Without understanding alternative 
imaginations — such as the cosmology of 
the Dongria Kondh or the compassion of 
Dhanalakshmi — we limit our power to 
effect change. = 


Harini Nagendra is professor of 
sustainability at Azim Premji University, 
Electronic City, Bangalore, India. 
e-mail: harini.nagendra@apu.edu.in 


1. Nagendra, H. Nature in the City: Bengaluru in 
the Past, Present and Future (Oxford Univ. Press, 
2016). 

2. Guha, R. How Much Should a Person Consume? 
Environmentalism in India and the United States 
(Univ. California Press, 2006). 

3. Connelly, M. Fatal Misconception: The Struggle to 
Control World Population (Harvard Univ. Press, 
2008). 

4. Ostrom, E. Governing the Commons (Cambridge 

Univ. Press, 2015). 

. Gudynas, E. Development 54, 441-447 (2011). 

. Lewis, M. Conserv. Soc. 1, 1-21 (2003). 

. Madegowda, C. Econ. Polit. Wkly 44, 65-69 

(2009). 

8. Tatpati, M., Kothari, A. & Mishra, R. The Niyamgiri 
Story: Challenging the Idea of Growth without 
Limits? (Kalpavriksh, 2016); available at 
https://go.nature.com/2k9ju8z 


NOOO 


CORRECTION 

In the Comment ‘Cybersecurity needs 
women’ (Nature 555, 577-580; 2018), 
he photo of female programmers was 
captioned incorrectly. They were at the 
US Army Ballistics Research Laboratory 
in 1962, not working on ENIAC at the 
University of Pennsylvania in the 1940s. 
Also, the figure of 57% cited for women in 
he US workforce was actually for women 
in the US professional workforce. 
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Human chromosomes and a nucleus in a false-colour image taken by scanning electron microscope. 


Beyond the gene 


Nick Lane relishes Carl Zimmer’s history of heredity in 
all its messiness, from genes and culture to epigenetics. 


for doubt, or caused more harm 
through false certainty, than heredity. 
In She Has Her Mother's Laugh, an illuminat- 
ing survey of the concept through history, 
science writer Carl Zimmer shows that scien- 
tists have often clung to travesties of the truth 
— and that we are still in danger of doing so. 
The book is a beguiling narrative of more 
than 600 pages. It blends popular science and 
history with a personal journey, culminat- 
ing in a plea for a nuanced view of heredity. 
Zimmer ably navigates some of the most 
fraught developments in research, politics, 
religion and race: from eugenics, slavery 
and genocide to IQ and genetic engineering 
in humans. He combines a deep personal 
empathy with clear scientific understanding. 
For instance, in presenting controversial fig- 
ures such as Henry Goddard — who coined 
the term ‘moror and helped to foster the US 
eugenics movement in the early twentieth 
century — he examines their hopes, fears 
and delusions, before dispassionately gut- 
ting their scientific errors and the disastrous 
consequences. 


3 ew subjects have afforded more room 


Compellingly, Zim- 
mer delves into his 
own genome. After 
having it sequenced 
at 90% coverage by 
Illumina in San Diego, 
California, he got 
his hands on the raw 
data, and approached 


She Has Her experts such as Dina 
Mother’s Laugh: Zielinski of the New 
The Powers, York Genome Center 


Perversions, 


: to help him unravel his 
and Potential of 


genes secrets. Zimmer 


Heredity : 

CARL ZIMMER uses this backstory to 

Dutton (2018) illustrate how genomes 
break up into millions 


of short stretches of DNA, each with its own 
history from around the world. 

Being told you have ancestors everywhere 
is one thing; it’s quite another to pin that down 
with visceral intensity. Of the 3,559,137 bases 
in Zimmer’s genome that differ from the 
human reference (a representative sequence 
based on a number of donors) he shares 
1.4 million single-nucleotide polymorphisms, 
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or SNPs, with two volunteers from China 
and Nigeria — plus another 530,000 with 
the Chinese individual and 440,000 with 
the Nigerian. On top of his roughly 1% 
Neanderthal inheritance (standard for a per- 
son of European descent), Zimmer even has 
a few Denisovan genes. We should think of 
the Denisovans as the eastern Neanderthals, 
he explains. One of their genes, EPAS1, might 
even have helped Tibetans to adapt to high 
altitudes, although most Denisovan DNA 
dwindled, leaving little more than a hint of 
our species’s promiscuous past. 

At a deeper level, the book is a serious 
treatise on why we need to overhaul our 
views on heredity. Zimmer shows how the 
idea evolved from medieval times, with the 
passing down of possessions, to our modern 
focus on genes. He recounts how nineteenth- 
century genetics pioneers Gregor Mendel 
and August Weismann seemed to bring 
clarity by defining simple laws of inheritance 
in sexual organisms, and by distinguishing 
between sex cells in the germ line and cells 
in the rest of the body (see J. Maienschein 
Nature 522, 31-32; 2015). But heredity soon 
returned to a swamp of ambiguity. Charles 
Darwin's cousin Francis Galton, a deeply 
flawed Victorian statistician and racist 
(who in 1904 founded what would become 
the Galton Laboratory at University College 
London; see go.nature.com/2i6uelm) crops 
up repeatedly, each time with a new layer of 
nuance or downright murkiness. 

It took the best part of a century for 
Mendelian genetics to be fully reconciled 
with complex hereditary traits such as 
height. Sophisticated statistical methods 
reveal such traits to be ‘omnigenic, influ- 
enced by millions of genetic markers. 
Intelligence is even worse; fairly heritable, 
certainly, but with a complexity that mocks 
simple ideas of Mendelian inheritance. 

The book goes on to tackle meiotic drive, 
in which ‘selfish’ genes evade Mendel’s laws 
by killing the 50% of sex cells that lack the 
selfish elements, so almost all the offspring 
inherit the selfish genes. Then we're onto cell 
lineages, where mutations acquired during 
development make genetic mosaics of us all; 
and microchimaeras, in which cells slip, in 
both directions, across the placental barrier 
between mother and fetus, sometimes persist- 
ing for decades and colonizing whole tissues. 
(The entire lobe of one womans liver, Zimmer 
notes, was composed of Y-chromosome- 
bearing cells from a male fetus, the paternity 
of which could be traced to her boyfriend.) 

Zimmer explicates transmissible tumours, 
which infect species including dogs and 
Tasmanian devils, and can persist in popula- 
tions for thousands of years, picking up new 
mitochondria from their hosts. He treats 
transgenerational epigenetic inheritance 
with due care, showing how some genetic 
settings controlled by chemical changes can 
be passed on with the genes themselves, 
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> modulating their activity over 
multiple generations. That can be seen 
in eighteenth-century taxonomist Carl 
Linnaeus’s ‘monstrous  peloria, a toadflax 
(Linaria vulgaris) with unusual, trumpet- 
shaped flowers — “no less remarkable 
than if a cow were to give birth to a calf 
with a wolf’s head”, as he put it. 

Zimmer completes his tour with chap- 
ters on the microbiome (some of which is 
as heritable as anxiety, and partly accounts 
for the inheritance of traits including 
weight), and cultural inheritance. Genes 
are expressed in a human-altered environ- 
ment, Zimmer notes, and their effects are 
as plastic as the culture that shapes their 
selection, right down to social inequali- 
ties. Our inherited environment governs 
our future more rigidly than our genes. 

In this encompassing view of heredity, 
we get a correspondingly nuanced 
vision of what, for example, germline 
editing using CRISPR will really mean. 
By acknowledging the ambiguous way 
in which genes actually work, and by 
embracing all these other factors that 
shape our lives, we make CRISPR less 
threatening because it is less definitive. 

Zimmer deconstructs the idea of the 
body as a genetic temple, built on Men- 
del’s sacrosanct ‘laws, along with genetic 
determinism. Instead, he calls for a view 
that includes “culture, epigenetic marks, 
hitchhiking microbes, or channels we 
don't even know about yet”. His argument 
is balanced and fair, comprehensive and 
bang up to date. Whatever your views on 
the power of genes versus other forms of 
heredity, you will be in for a few surprises. 

There are some weaknesses. Zimmer 
makes no real attempt to explain how 
Mendel’s laws arose in our single-celled 
ancestors, and offers rather cursory 
descriptions of early evolution. And his 
sympathy for the underdog can go too far. 
His portrait of crystallographer Rosalind 
Franklin, for example, seemed to me too 
partial. You would never imagine, from 
Zimmer’s depiction of her meticulous 
science, that Franklin had circulated an 
obituary of the DNA helix nine months 
before Francis Crick and James Watson's 
paper on the double helix appeared in 
Nature. But these are quibbles. 

In She Has Her Mother’s Laugh, Zimmer 
has built a subtle, multifaceted and deep 
understanding of heredity, grounded in 
revelatory insights from genome sequenc- 
ing. And he shows that we will need it to 
face our uncertain future. m 


Nick Lane is professor of evolutionary 
biochemistry at University College 
London, and author of four books on 
evolution, most recently The Vital 
Question: Why is Life the Way it Is? 
e-mail: nick.lane@ucl.ac.uk 
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Can robots make art? 


Laura Spinney encounters a Paris exhibition that 
probes the concept of algorithmic creativity. 


\ T isiting the exhibition Artists and 
Robots at the Grand Palais in Paris, I 
happened on the artist ORLAN, best 
known for her work involving body modi- 
fication. She was standing close to her 2017 
work ORLAN and the ORLANOID, in which 
her video presence interrogates a lookalike 
robot on matters of life and death. Having 
borrowed the robot's lensless glasses for a 
photo shoot, she needed her own back. I was 
struck by the robot's lack of reaction as she 
made the swap. It underscored my answer to 
the question posed by this exhibition: can a 

robot create a work of visual art? 
My feeling is no, for the simple reason that 

it cant see. I recom- 


mend the show, “Artificial 
nonetheless. It tmagination 
forcedmetoexam- has yet to get 
inewhatImeanby off the starting 
seeing or — more blocks.” 


broadly — sensing 
the world, and hence what I mean by art. 
Artists and Robots showcases robots and 
their output in roughly chronological sec- 
tions. The first recounts how, starting in the 
1950s, visionary artists such as Jean Tinguely 
and Nicolas Schéffer built robots — to begin 
with, no more than collections of mobile parts 
driven by motors — to create kinetic art. The 
second tracks that impulse forward from 
the digital revolution, starting in the 1970s. 


Artists and Robots 
Grand Palais, Paris. 
Until 9 July 2018. 


And the third — opti- 
mistically entitled “The 
robot emancipates 
itself’ — explores their 
present status and looks to the future. 

When robots were all jointed arms and 
motors, they executed an artist’s vision chan- 
nelled by their own capacities as machines. 
Modern French artist Patrick Tresset'’s ironic 
spin on this relationship features in the 
first section of the show. In the installation 
Human Study #2, three sets of robot arms and 
cameras — the ‘hand’ and ‘eye’ — repeatedly 
draw a set of objects including a stuffed fox 
andahuman skull. They are programmed to 
copy both the objects and Tresset’s drawing 
technique, while introducing small variations 
that he characterizes as artistic, expressive 
and obsessional. It’s through such serendipi- 
tous additions and mistakes, the artist seems 
to suggest, that the greats became great. 

The digital revolution ushered in software 
and algorithms as artists’ tools or assistants, 
and the possibilities exploded. We see this in 
stunning works in the second section, from 
conceptual artist Joan Fontcuberta’s self- 
described “hallucinatory” landscapes to lab- 
yrinthine wallpaper from multimedia whizz 
Peter Kogler. This covers an entire room, so 
that we seem enclosed in an optical illusion. 
For me, the works’ technical sophistication 
seems only to accentuate their soullessness, 
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and never more so than when they show up 
the fallibility of human perception. 

In 2003-04 paintings from his Orogenesis 
series, for example, Fontcuberta takes algo- 
rithms that create 3D landscapes from 2D 
map coordinates, and forces them to re-inter- 
pret the landscape paintings of artists such as 
J. M. W. Turner and Paul Cézanne. The results 
are highly naturalistic, but they reminded me 
of Swiss writer Charles-Ferdinand Ramuz’s 
idea that once the human element is banished 
from a place, it becomes a non-place. 

Likewise Michael Hansmeyer’s 2017 
Astana Columns. These architectural forms, 
created by an algorithm that applies evolu- 
tionary principles to repeatedly subdivide a 
Doric column, were assembled from laser- 
cut cardboard and other materials. They 
provoke awe through their sheer complex- 
ity, but in the way that a termite mound does: 
what's impressive is not that they were imag- 
ined, but that they were unimagined. 

By the time I reached the future-facing 
section, where I bumped into ORLAN, I 
had concluded that, notwithstanding the 
section's title, the robot had not emanci- 
pated itself. Pascal Haudressy’s 2009 anima- 
tion Brain, for example, evolves thanks to 
glitches the artist introduced into the gov- 
erning algorithm, that force the computer 
to continuously recalculate the coordinates 
of each pixel. Ultimately, however, it is less 
impressive than animations of the actual 
evolution of the human brain. 

Although artificial intelligence has 
advanced in leaps and bounds since the 
1950s, artificial imagination has yet to get off 
the starting blocks. As curators Laurence Ber- 
trand Dorléac and Jéréme Neutres suggest in 
an explanatory video, these are artists’ robots 
rather than robot artists. But if Artists and 
Robots doesn't tell you what artis, it does ven- 
ture into fascinating new territory to tell you 
what it’s not: random copying errors might be 
necessary, but they are also insufficient. 

That said, perhaps robot imaginations 
have already liberated themselves outside 
the confines of human artists’ studios, and 
their art is radically different from ours; so 
different that we don’t recognize it when we 
see it, glasses or no glasses. I can’t wait for the 
first exhibition curated by robots — assum- 
ing it’s advertised to non-robots. = 


Laura Spinney is a writer and science 
journalist based in Paris. 
e-mail: Ifspinney@gmail.com 


CORRECTION 

In the article ‘Feynman at 100’ (Nature 
557, 164-165; 2018), a picture caption 
mistakenly referred to Richard Feynman 
lecturing at the California Institute of 
Technology; the picture was actually taken 
at California State University, Long Beach. 
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Books in brief 


Tesla: Inventor of the Modern 

Richard Munson W. W. NorRTON (2018) 

Around the turn of the twentieth century, Serbian-born visionary 
Nikola Tesla authored transformative inventions from the alternating- 
current system to remote control, drones and (arguably) radio. He 
even foresaw radar, mobile phones and the Internet. Yet, as Richard 
Munson reveals in this penetrating biography, Tesla’s lack of business 
sense allowed others to prevail. Munson makes vivid the genius’s 
eventtul life, from his mother’s inspirational labour-saving inventions 
to his psychological complexity — and his estimable belief that 
“technology should transcend the marketplace”. 


SERHII PLoKyy 


CHERNOBy) 


Chernobyl: The History of a Nuclear Catastrophe 

Serhii Plokhy Basic (2018) 

Soon after midnight on 26 April 1986, a turbine test at Ukraine’s 
Chernobyl nuclear power plant went stupendously wrong. The 
explosion released 14 exabecquerels of radiation; the fallout 
contaminated 20% of neighbouring Belarus and crossed more than 
half of Europe. Historian Serhii Plokhy’s deft, richly detailed account 
draws on newly opened archives and weaves in stories of players 
such as Chernoby! director Viktor Briukhanov. The disaster’s roots, 
he asserts, were a toxic tangle of shoddy construction, human error, 
flawed governance and complacency in the Soviet nuclear industry. 


The Digital Ape 

Nigel Shadbolt and Roger Hampson SCRIBE (2018) 

Numbed by dire warnings of technological Armageddon? Computer 
scientist Nigel Shadbolt and economist Roger Hampson dispel the 
miasma with this superb survey of the landscape we “digital apes” 
have wrought. Humanity’s tool use, spanning everything from hand- 
axes to CRISPR, has spawned marvels such as a hyper-networked 
society, “social machines” like Wikipedia and artificial intelligence. 
But to avoid succumbing to the inherent dangers, Shadbolt and 
Hampson urge wise choices: to hold Silicon Valley to account, ensure 
transparent algorithmic decision-making and own our own data. 


How to Change Your Mind 

Michael Pollan PENGUIN PRESS (2018) 

Journalist Michael Pollan explored psychoactive plants in The Botany 
of Desire (2001). In this bold, intriguing study, he delves further, 
homing in on psychedelic compounds currently under study, such as 
psilocybin. He meets a vast cast of researchers, including mycologist 
Paul Stamets and neuroscientist Robin Carhart-Harris, who works on 
neural correlates of the psychedelic experience. Pollan even “shakes 
the snow globe” himself, chemically self-experimenting in the spirit 
of psychologist William James, who speculated about the wilder 
shores of consciousness more than a century ago. 


Still Waters 

Curt Stager W. W. NorRTON (2018) 

“There is nothing like a lake to reflect and reveal the world.” So 
declares ecologist Curt Stager, whose lyrical evocation of ‘living 
waters’ offers geological and biological revelations. He also 
probes our relationship to lakes as one body of water to another, 
examining mud cores from Walden Pond in Massachusetts 
(immortalized by nineteenth-century naturalist Henry David 
Thoreau), flesh-dissolving alkaline minerals in Tanzania’s Lake 
Natron, and the endemic species crowding crystal-clear Lake 
Baikal in Siberia. Barbara Kiser 
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Recruit fresh talent 
for coral reefs 


Coral reefs are again in the 
spotlight, having suffered mass 
mortality over the past two years 
from global bleaching events. 
Before reef resilience runs out, 
researchers must move beyond 
lamenting corals’ lost pristine 
state and develop pragmatic 
solutions. In our view, these 
are likely to stem from a more 
diverse set of stakeholders than 
have participated so far. 

We must ensure that reefs 
can continue to provide well- 
being for millions of people in 
the future, despite widespread 
alterations in their biological 
state. Degraded reefs still have 
the potential to provide fisheries 
benefits, cultural value and 
other sources of revenue (such 
as tourism), although all of these 
are likely to be reduced. 

With 2018 designated 
the International Year of 
the Reef, fresh perspectives 
and approaches are needed 
(S. A. Hewlett et al. Harvard 
Bus. Rev. 91, 30; 2013). New 
recruits should come froma 
greater variety of sectors (such 
as development, health and 
governance) and from a wider 
set of disciplines (such as the 
social sciences — including 
psychology, economics, political 
science and geography) than 
today’s conservationists. Young 
scientists and researchers 
from the global south will 
be particularly important 
contributors. 
Gabby Ahmadia* World Wildlife 
Fund, Washington DC, USA. 
gabby.ahmadia@wwfus.org 
*On behalf of 14 co-signatories (see 
go.nature.com/2kwzqk7 for full list). 


Atacama imperilled 
by lithium mining 


The demand for lithium — 
used in rechargeable batteries 
for mobile phones, electric 
vehicles and other devices — 
caused a 13% surge in global 
production last year (go.nature. 
com/2guqzc8). In the Salar 


de Atacama in Chile, part of 
South America’s vast ‘lithium 
triangle’ of high-altitude 

lakes and salt flats, more than 
1,700 litres of lithium brines 
are pumped from the shallow 
subsurface every second. This 
intense activity in one of the 
driest areas in the world is 
causing serious friction over 
water rights between local 
communities and mining 
companies and is putting huge 
pressure on a fragile and poorly 
understood ecosystem. 

For example, the region’s 
isolated wetlands are rich in 
species that are unique to the 
area. These are crucial islands 
of habitat for migratory and 
resident birds, including the 
threatened Andean flamingo 
(Phoenicoparrus andinus). 
Harmful cyanobacteria usually 
eaten by these birds accumulate 
in the water polluted by lithium 
extraction, putting human health 
at risk (T. C. Wanger Conserv. 
Lett. 4, 202-206; 2011). 

A Chilean parliamentary 
commission has acknowledged 
that overexploitation of 
groundwater resources has 
damaged ecosystems in the Salar 
de Atacama basin, and that little 
attention has been paid to threats 
from mining (see go.nature. 
com/2mnhuwm, in Spanish). We 
urge the government to rethink 
its policies to account for the 
political, social and ecological 
impact of huge mining projects. 
Jorge S. Gutiérrez, Juan 
G. Navedo Austral University of 
Chile, Valdivia, Chile. 

Andrea Soriano-Redondo 
University of Exeter, Penryn, UK. 
jorgesgutierrez@unex.es 


Risks from extracted 
technology metals 


The global market for rare 
metals from Earth’s crust is 

on the rise because of their 
crucial role in technologies used 
in electronics, biomedicine 

and the automotive industry, 
for example. Most of these 
elements are extracted in 
African and South American 
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countries, where environmental 
protections are often 
poor. We call for rigorous 
investigation into these metals’ 
environmental concentrations, 
biogeochemical cycles and 
possible risks to human and 
animal health. 
Technology-critical elements 
include tantalum, gallium, 
germanium, indium, niobium, 
tellurium, thallium and other 
scarce metals. Despite their 
economic importance (see, 
for example, A. L. Gulley et al. 
Proc. Natl Acad. Sci. USA 115, 
4111-4115; 2018), little is known 
about their environmental 
effects (M. Filella Earth Sci. 
Rev. 173, 122-140; 2017). For 
example, tantalum increases 
in aquatic organisms at each 
successive level of the food chain 
(W. Espejo et al. Environ. Sci. 
Technol. Lett. 5, 196-201; 2018) 
but it is unclear whether this 
accumulation poses a threat to 
humans and other consumers. 
A better understanding of 
the effects of extraction and 
consumption of technology- 
critical elements will help 
to mitigate the risks to 
environmental and human 
health in producer countries. 
Winfred Espejo University of 
Concepcion, Chile. 
Cristébal Galban-Malagén 
Andres Bello University, Santiago, 
Chile. 
Gustavo Chiang Merimoyu 
Ecosystem Research Institute 
Foundation (MERI), Santiago, 
Chile. 
cristobal.galban@unab.cl 


Publish translations 
of Chinese papers 


Language is still a barrier to 
scientific development (see, 
for example, V. S. Lazarev and 
S. A. Nazarovets Nature 556, 
174; 2018). We suggest that the 
best research papers published 
in Chinese or other languages 
(for instance, highly cited 
articles) should be routinely 
translated and republished to 
render them more visible to the 
English-language-dominated 


research community. 

Since 1979, around 
79 million papers have been 
published in Chinese — 
including in China's highest- 
quality journals, according to 
the China National Knowledge 
Infrastructure databases 
(http://oversea.cnki.net; see 
also Nature 553, 390; 2018). 
Many important advances 
are therefore going unseen by 
Western researchers. 

An example is a landmark 
study by Youyou Tu, who 
shared a Nobel prize in 2015 
for the discovery of artemisinin 
and the treatment of malaria 
(Y. Tu et al. Acta Pharm. Sin. 
16, 366-370; 1981), which 
was cited only once outside 
China. And as of 2 May, all but 
3 of 347 citations of the most- 
cited Chinese-language paper 
in the Web of Science Core 
Collection came from Chinese 
authors. (The paper discusses a 
radioisotope technique that is 
used to date rocks; see F. Y. Wu 
et al. Acta Petrol. Sin. 23, 
185-220; 2007). 

Breakthroughs such as 
Microsoft’s algorithm for 
Chinese-English machine 
translation could speed up 
international sharing of 
Chinese publications (see 
go.nature.com/2jhxuwo). 
Efforts need to focus on which 
papers should be selected for 
translation by engaging with 
publishers, authors and other 
experts, and on resolving 
copyright- ownership issues. 
Juan Tao, Chengzhi Ding 
Yunnan University, Kunming, 
China. 

Yuh-Shan Ho Asia University, 
Wufeng, Taichung, Taiwan. 
chzhding@ynu.edu.cn 
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ASTROPHYSICS 


Pulsars seen through a new lens 


Radio waves produced by cosmic lighthouses called pulsars are distorted by surrounding material. Observations show that 
this material can act as a lens, focusing the waves and boosting the brightness of the pulsar. SEE LETTER P.522 


JASON HESSELS 


uch of what we know about the 
Universe comes from using 
telescopes to collect and study the 


electromagnetic waves produced by astro- 
nomical objects. These waves can be distorted 
and absorbed as they travel to Earth, provid- 
ing valuable information about the other- 
wise invisible material that lies between stars 
and between galaxies. Rotating stellar rem- 
nants called pulsars have long been known 
to be excellent probes of interstellar matter’. 
A pulsar generates a beam of radio waves that 
flashes across the sky as the object rotates, just 
like the light from a lighthouse. On page 522, 
Main et al.’ report that ionized gas close to a 
pulsar can greatly amplify the object's observed 
brightness, and allow astronomers to zoom in 
on the source of the emission. The discovery 
could also help to explain enigmatic astrophys- 
ical signals known as fast radio bursts. 

Main and colleagues targeted a pulsar’ 
that is officially termed PSR B1957+20, but is 
known informally as the black widow (Fig. 1). 
The pulsar and a companion star exist in a 
binary system, in which the two bodies orbit 
around a common centre of mass. The binary 
system itself is minuscule — it would almost 
fit inside the Sun. 

In addition to its radio beam, the pulsar 
generates a strong wind of particles that strips 
matter from the companion star and produces 
a surrounding cloud of ionized and magnet- 
ized gas called a plasma. This is where the 
ominous name of the pulsar comes from — like 
the sexually cannibalistic spider after which it 
is named, the black-widow pulsar is destroying 
its mate. From the point of view of an observer 
on Earth, the pulsar is eclipsed once per orbit, 
when it travels behind the plasma cloud. 

Main et al. used the 305-metre-diameter 
William E. Gordon Telescope at the Arecibo 
Observatory in Puerto Rico to study the black- 
widow pulsar in unprecedented detail. The 
authors meticulously mapped how the observed 
brightness of the pulsar changes on timescales as 
short as microseconds. What they found is that 
the pulsar seems to be much brighter at certain 
points of its orbit — in particular, at the edges 
of the plasma cloud, just before and after the 
pulsar is eclipsed. For only a few milliseconds 
at a time, and at specific radio frequencies, the 
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Figure 1 | The black-widow pulsar. A pulsar is a rotating stellar remnant that generates a powerful 
beam of radio waves. Main et al.’ report observations of a binary system that comprises a pulsar known 
as the black widow and a small companion star. A wind of particles from the pulsar strips matter from the 
companion star and produces a surrounding cloud of ionized and magnetized gas called a plasma. The 
authors demonstrate that the plasma relatively close to the companion star can act as a lens, focusing the 
radio waves and boosting the observed brightness of the pulsar. 


pulses can appear to be up to 80 times brighter 
than average (see Fig. 2 of the paper’). 

The authors hypothesize that the edges of 
the plasma cloud act as a lens, boosting the 
observed brightness of the pulsar for short 
periods. Think of how a magnifying glass 
focuses light; in a broader sense, any medium 
that causes electromagnetic waves to change 
direction can serve as a lens. In a plasma lens, 
radio waves are bent, and the waves arriving at 
an observer from different angles can overlap 
to produce a bright spot known as a caustic’. 
A similar effect with light can be seen at the 
bottom ofa swimming pool on a sunny day. 

Main and colleagues also demonstrated that 
the plasma-lensing effect can be used to zoom 
in on the pulsar, by studying how the effect 
changes with time and with observed radio 
frequency. This is astounding because pulsars 
are no more than about 30 kilometres in diam- 
eter’. They produce their radio waves from a 
magnetized atmosphere known as a magne- 
tosphere, which rotates along with the pulsar. 
In the case of a fast-spinning pulsar such as 
the black widow, the magnetosphere extends 
only about 100 km above the pulsar’s surface’. 

Based on the time- and frequency-depend- 
ent way in which the observed pulses are 
amplified, Main et al. inferred that they could 


detect the physical offset in emissions that 
originate from different regions of the mag- 
netosphere. In other words, the lensing effect 
allowed the authors to measure a kilometre- 
scale offset despite the fact that the black- 
widow pulsar is about 10'” km away. Such a feat 
is equivalent to measuring the width ofa hair 
on Mars from Earth. 

The black-widow pulsar provides a stunning 
illustration of plasma lensing in action, but 
there are other astrophysical examples of this 
phenomenon. For instance, variations in the 
observed radio brightness of distant super- 
massive black holes called quasars have been 
ascribed to lensing from poorly understood 
plasma structures in our Galaxy®”. Further- 
more, echoes of pulses from the Crab pulsar 
could be caused by lensing from plasma fila- 
ments in the pulsar’s surrounding nebula’. 
Again, plasma lensing not only boosts the 
brightness of astronomical objects, but also 
reveals what would otherwise be invisible. 

Plasma lensing might allow astronomers to 
peer deeper into the Universe than would oth- 
erwise be possible. More specifically, Main and 
colleagues suggest that their observations pro- 
vide a clue to solving the mystery of fast radio 
bursts — millisecond-duration radio flashes 
that seem to originate in distant galaxies, but 
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which are bright enough to detect on Earth. 

Intriguingly, the effects observed in the 
black-widow pulsar are similar to distortions 
seen in pulses from at least one source of fast 
radio bursts’. Plasma lensing might there- 
fore be responsible for boosting the bright- 
ness of fast radio bursts, as has previously 
been hypothesized and modelled’. However, 
the story is not complete: the environment 
around a source of fast radio bursts is probably 
quite different from, and possibly even more 
extreme than, that of the black-widow pulsar. 
It perhaps has more in common with the 
environment around the young Crab pulsar 
or at the centre of our Galaxy". 

Main et al. have detected plasma lensing in 


DEVELOPMENTAL BIOLOGY 


a pulsar that has been studied for more than 
30 years, using a telescope that has been oper- 
ating since the early 1960s. Why the sudden 
insight? As computing and data-recording 
power has grown, so has the ability to use 
venerable radio telescopes to scrutinize pulsars 
on shorter timescales and over a wider range 
of radio frequencies. This suggests that the 
future is bright for using pulsars to illuminate 
the invisible Universe. m 


Jason Hessels is at ASTRON (the Netherlands 
Institute for Radio Astronomy) and the Anton 
Pannekoek Institute for Astronomy, University 
of Amsterdam, 1098XH Amsterdam, the 
Netherlands. 


Rethinking WNT 


signalling 


The identification of genetic mutations that can hinder the development of 
human limbs has led to the discovery of an unanticipated mode of regulation for 
the WNT signalling pathway during limb development. SEE LETTER P.564 


JESSICA A. LEHOCZKY & CLIFFORD J. TABIN 


fter more than 30 years of intense 
As the major intercellular signal- 
ling systems that orchestrate embry- 
onic development and tissue maintenance are 
reasonably well understood. Although there 
are many crucial details yet to be worked out, 
there is a tendency among researchers in the 
field to think that the major players in these 
signalling pathways — and the ways in which 
they interact — are known. On page 564, 
Szenker-Ravi et al.' remind us that this is not 
necessarily the case. 
WNT proteins are signalling molecules 
whose activity controls many processes, from 
tissue organization and body-axis formation 


during embryonic development to maintenance 
and regulation of stem cells in adult tissue. 
WNITs signal through several distinct intracel- 
lular pathways, but these pathways share an ini- 
tial step: WNT molecules outside the cell bind 
to and activate receptors of the Frizzled family 
that span the cell membrane. Frizzled receptors 
must be present in the membrane for intracel- 
lular WNT-pathway activation. 

In vertebrates, an auxiliary regulatory 
process is involved in controlling the accu- 
mulation of Frizzled receptors, and hence in 
determining WNT signalling levels. The sys- 
tem involves three groups of proteins””: an 
LGR (LGR4, 5 or 6), an extracellular R-spon- 
din (RSPO1, 2, 3 or 4), and an E3 ubiquitin 
ligase enzyme (ZNRF3 or RNF43). In the 
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absence of RSPOs, the ubiquitin ligase tags 
Frizzled receptors with ubiquitin molecules, 
which mark the receptors for degradation. 
This results in low membrane concentrations 
of Frizzled, so WNT signalling is attenuated**® 
(Fig. 1a). Conversely, when RSPOs are present, 
they bind to LGRs””, and the resulting complex 
binds the ubiquitin ligase and prevents it from 
tagging Frizzled. The receptors accumulate, 
and WNT signalling can occur** (Fig. 1b). 
This mechanism has a crucial role in 
regulating WNT signalling in stem-cell com- 
partments characterized by LGR expression, 
for instance in the intestine’ and hair follicles’. 
Because WNT signalling has been extensively 
studied in these contexts, the LGR-RSPO- 
ligase complex has become part of the standard 
picture of how WNT activity is regulated. 
Szenker-Ravi and colleagues’ study began 
with a genetic analysis of five families affected 
by either tetra-amelia syndrome, which is char- 
acterized by the lack of all four limbs and by lung 
abnormalities, or by a previously undescribed 
syndrome involving severe limb malformations. 
The authors found that these two syndromes 
are caused by five mutations in the RSPO2 gene 
that disrupt different protein domains. Using in 
vitro assays, the researchers demonstrated that 
the mutations prevent RSPO2 from binding to 
LGR or RNF43, and so inhibit WNT signalling. 
So far, Szenker-Ravi and co-workers data 


a ZNRF3/RNF43 a RSPO c 
Frizzled 
receptor 
Cell exterior 
vm _ ai 
a ij 
| 0 AN A 
— . 7 
Il Bie gee = 
cot nia for i Unknown 
degradation mechanism 
Cell interior Attenuated ~~ Increased 
signalling signalling signalling 


Figure 1 | Updated model of WNT-signalling regulation. WNT signalling, 
which is triggered when WNT proteins bind to Frizzled receptors that span 
the cell membrane, is conventionally thought to be regulated by interactions 
between three groups of proteins: R-spondins (RSPO1 to 4), LGRs (LGR4 

to 6) and ubiquitin ligase enzymes (ZNRF3 or RNF43). a, In the absence of 
RSPOs, LGR and ZNRF3 or RNF43 do not interact, and the ubiquitin ligase 
catalyses a reaction that marks Frizzled receptors for degradation. WNTs 


cannot bind the degraded Frizzled, and WNT signalling is hampered. b, 
When present, RSPO binds LGRs and ubiquitin ligase, preventing enzyme 
activity. This allows Frizzled receptors to accumulate and so increases WNT 
signalling. c, Szenker-Ravi et al.' report that, in the developing limb and lung, 
RSPO can bind ubiquitin ligase without LGRs. They propose that another, 
unidentified, protein enables this interaction, which leads to increased WNT 
signalling through an unknown mechanism. 
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fit nicely with the known roles of RSPOs, 
LGRs and ubiquitin ligases. And, like peo- 
ple carrying RSPO2 mutations, mice lacking 
Rspo2 have limb abnormalities’. The authors 
expected that the loss of LGR activity would 
have the same effect. But they got a surprise 
when they analysed mice lacking the Lgr4, 
5 and 6 genes — the triple-mutant embryos 
did not have limb or lung abnormalities. This 
suggests that, in some tissues, RSPO2 (and 
perhaps other RSPOs) can act independently 
of LGRs, potentiating WNT signalling in the 
absence of its usual binding partner. 

To test this idea directly, the group next 
investigated whether cells isolated from LGR 
triple-mutant embryos are capable of RSPO- 
mediated WNT signalling. They found no 
evidence of WNT signalling when these cells 
were exposed to Rspol or Rspo4, but WNT 
activity was detected in the presence of Rspo2 
or Rspo3. Thus, RSPO2 and RSPO3 seem to be 
able to induce WNT signalling independently 
of LGRs. However, these RSPOs still seem to 
act through their normal ubiquitin ligase tar- 
gets, because Szenker-Ravi et al. found that 
modulation of ZNRF3 alters WNT signalling 
in triple-mutant cells. Consistent with this pic- 
ture, the authors showed that deletion of rspo2 
in the frog Xenopus laevis led to missing limbs, 
whereas deletion of the znrf3 and rnf43 genes 
led to extra limbs. 

This study demonstrates that the accepted 
model of WNT-receptor modulation does not 
hold in the case of limb and lung development. 
Szenker-Ravi et al. hypothesize that a sepa- 
rate, unidentified receptor is necessary for this 
LGR-independent WNT signalling (Fig. 1c). 
Notably, a study published earlier this year”” 
identified one potential candidate. That work 
showed that RSPO2 and RSPO3 can bind to 
ZNRF3 or RNF43 in conjunction with hepa- 
rin sulfate proteoglycan (HSPG) molecules in 
lieu of LGRs, to enable WNT signalling in vitro. 
Future work will be required to test whether 
HSPGs play this part in the context of lung and 
limb development. In addition, it remains to 
be determined whether the HSPG-RSPO- 
ZNRF3 complex promotes WNT signalling by 
preventing ZNRF3 activity, or whether another 
mechanism is at work. Either way, it will be 
important to determine the extent of any func- 
tional similarities between LGR- and HSPG- 
based complexes, and to uncover whether there 
is any pattern to the use of LGR or HSPG as a 
cofactor in a particular tissue. 

Szenker-Ravi and colleagues’ work also 
points to ways to broaden our understand- 
ing of processes that require WNT signalling, 
such as limb development. For example, anal- 
ysis of the early stages of limb development in 
frog embryos lacking znrf3 and rnf43 could 
reveal why these mutations lead to extra limbs. 
Do ZNRF3 and RNF43 act as ‘master regula- 
tors’ of limb numbers, as the authors propose? 
Consistent with this idea, WNT activity has 
a role in initiating the formation of the limb 
bud" (which eventually gives rise to the limb). 
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Alternatively, rather than being master regula- 
tors, these proteins might mediate limb num- 
bers indirectly. For example, extra limbs might 
arise as a secondary consequence of expansion 
of the pool of limb progenitor cells, or they 
might arise because of changes in the for- 
mation ofa signalling centre at the tip of the 
limb bud that directs limb outgrowth — both 
WNT-dependent processes’*”’. 

Finally, it will be interesting to evaluate LGR- 
independent, RSPO-mediated WNT signalling 
in cancer. Chromosomal abnormalities that 
lead to activation of RSPO2 or RSPO3 have 
been shown to drive WNT-dependent colon 
tumours”. Szenker-Ravi and colleagues’ dem- 
onstration that these two RSPOs can modulate 
WNT activity independent of LGR adds a twist 
to these findings, and should prompt scientists 
to look for cancer-causing mutations in RSPO2 
or RSPO3 in cells outside LGR-expressing cell 
compartments. m 
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Sizing up human 
brain evolution 


An innovative computational analysis of factors that might have influenced 
human brain evolution suggests that ecological, rather than social, factors had a 
key role in the evolution of large, rapidly developing brains. SEE LETTER P.554. 


RICHARD MCELREATH 


ost organisms are brainless but 
thriving. Brains are expensive to 
produce and maintain, and in the 


human lineage they have grown so large as to 
incur a substantial metabolic burden as the 
brain develops'. A human brain stops grow- 
ing by the age of ten, long before the body 
reaches physical maturity, and this costly and 
fast process of brain growth has been proposed 
to cause a delay in body growth’. Brain growth 
is not given priority in this way in other apes, 
and the human pattern is puzzling because 
it keeps our bodies smaller, more vulnerable 
and less productive for longer. The answer to 
this riddle must lie in how the human brain 
helped our ancestors to survive and reproduce. 
On page 554, Gonzalez-Forero and Gardner’ 
investigate the role of different factors as pos- 
sible drivers of our unusually large brains, 
and determine how well these factors might 
account for the pattern of changes in brain and 
body size that occur as humans develop. 
Proposals for how large brains evolved in 
humans include ecological, social and cultural 
hypotheses. The ecological-intelligence hypoth- 
esis suggests that environmental challenges, 


such as finding food, are paramount in driving 
brain-size evolution’. The social-intelligence 
hypothesis suggests instead that the competitive 
and cooperative challenges of living with other 
members of the same species are the key factor”. 
The cultural-intelligence hypothesis combines 
these two ideas, suggesting that the social learn- 
ing of ecologically relevant skills explains the 
extreme brain investment of our lineage’. 
Until now, testing these hypotheses has 
relied mainly on comparative studies that 
correlate data on brain characteristics such as 
size (as an approximation of intelligence) with 
features such as cognition, ecology and group 
living. These regression approaches, which 
seek to identify variables that are associated 
with brain size, have been valuable for refining 
theories and the data measurements needed. 
However, such regression studies can 
generate conflicting and confusing results. 
Changes to brain and body growth can have 
a reciprocal effect on each other for various 
reasons, such as metabolic constraints and 
energy-production needs, so such interactions 
between the brain and the body are complex 
and nonlinear. This makes the results of 
regression studies hard to interpret, because 
they cannot be connected directly to a relevant 
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evolutionary model. Researchers in the field 
should stop theorizing using one set of mod- 
els while analysing data with another. Moving 
from purely statistical models, such as regres- 
sion approaches, to studies that test evolution- 
ary models could accelerate future progress. 

The study of human brain evolution must 
by necessity be observational, because direct 
experimentation to test the role of variables is 
not an option. But working out what affects 
different components in such observational 
systems is hard. When Ronald Fisher, a leading 
evolutionary biologist and statistician of the 
twentieth century, was asked how one could 
infer causality in such cases, his advice was to 
“make your theories elaborate”’. 

Automobile engineering can provide an 
analogy for studying this type of system. It 
would be difficult to understand racing-car 
design through regression analysis of how 
engine size varies depending on changes in 
other features, such as the mass and shape of 
the car. Instead, a model is needed that uses 
physical laws to predict optimal combinations 
of the variables under different criteria. Under- 
standing brain evolution poses a similar chal- 
lenge in that an organism's features co-evolve 
under biological constraints. 

Gonzalez-Forero and Gardner’s approach 
heeds Fisher's advice because the authors gen- 
erated an elaborate model to investigate brain 
evolution. Modelling brain evolution in this way 
can produce many precise predictions of brain 
size that can easily be falsified. And because the 
model is based on biological characteristics, it is 
easy to learn from it. When the model’ results 
do not match the observed evidence of brain 
size, the biological assumptions can be studied 
to understand why the model failed. 

In the authors’ computational set-up, as 
a human individual ages, there is a schedule 
of investment in brain, body and reproduc- 
tive tissue. As individuals grow, an increase in 
brain size allows for an increase in skill, and an 
increase in body size makes it easier to convert 
that skill into energy. The skill boost also aids 
successful reproduction. The model generates 
life-history scenarios that are linked to specific 
predictions of brain and body sizes. 

The metabolic costs of maintaining bodies 
and brains were assigned in the model by using 
previously determined metabolic-scaling rela- 
tionships, which provide information such as 
how the metabolic rate changes depending on 
an organism's size. These metabolic costs were 
fixed in the authors’ model, and the importance 
of different types of challenge were estimated 
by varying the weighting of these challenges 
and assessing the subsequent effect on the pre- 
dicted brain and body sizes (Fig. 1). The authors 
explored four types of challenge: ecological (me 
versus nature), cooperative ecological (us versus 
nature), between-individual competitive (me 
versus you), and between-group competitive (us 
versus them). The authors determined which 
combination of challenge weighting gave rise to 
a pattern of hypothetical brain and body growth 
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Figure 1 | Modelling the evolution of human brain size. Compared with other apes, humans have 
distinctively large and rapidly developing brains', and how this human developmental pattern evolved 

is debated. Gonzalez-Forero and Gardner’ report a computational modelling analysis that investigates 
the role of ecological factors and social factors (such as cooperation or competition between individuals) 
in driving the evolution of human brain size. The authors’ model predicts human brain and body 

size depending on the relative weighting of ecological and social factors. Some examples of challenge 
weighting are shown to the left of the corresponding predictions generated in modelling results (data 
from Fig. 3 of ref. 2). Comparing such predicted values with the observed average brain and body size of a 
female adult enabled the authors to determine the relative importance of the evolutionary drivers, leading 
them to identify ecological drivers as being the major determinant of human brain size in their analysis. 


that was most consistent with that observed 
during human life history. 

Gonzalez-Forero and Gardner’s analysis 
reveals a major role for ecological intelligence 
in driving human brain and body growth in 
this system. The best match between model 
predictions and observed human growth 
patterns came from assigning a weight of 60% 
to ecological challenges in the model. 

By contrast, social challenges were less likely 
to contribute to the observed human growth 
patterns. Competitive challenges between indi- 
viduals or groups are linked to large brains and 
to a body size that is smaller than the observed 
value. In competition, as skill increases, such 
gains in skill can lead to diminishing returns 
in terms of an increase in energy extraction 
because what each individual is competing 
against becomes continually harder to over- 
come. For example, skill increases in one indi- 
vidual could be matched by skill increases in 
other competitors, thereby limiting the energy 
boost from a skill increase. By contrast, the 
challenge itself doesn’t evolve in ecological 
challenges, so overcoming ecological chal- 
lenges can lead to a more-efficient energy 
boost. The best-matched model had a 10% 
weight for between-group competition. 

Cooperation was found to have more of an 
effect. The best-matched model assigns 30% 
weight to cooperative challenges. However, 
cooperation could lead to a reduction in brain 
size because individuals could potentially free- 
load on the intelligence of others, evidence of 
this effect has been observed in some animals’. 

Ecological drivers are the clear winner. But 
the model fails to address the possible role 
of cultural intelligence, as the authors admit, 
because cultural dynamics are not included. 
The authors’ results are consistent with the 
cultural-intelligence hypothesis, but any such 


possible connection remains speculative. 

Some of the model's size-prediction results 
are sensitive to the details, such as the precise 
way in which skill translates into reproduc- 
tive success. However, this provides a valuable 
opportunity to understand previously unno- 
ticed implications of hypotheses about the chal- 
lenges driving brain evolution, and to identify 
targets for future work. For example, the model 
would benefit from more measurements of the 
rate at which skills increase with age, because 
few data of this kind currently exist. 

Finally, because the model aims to explain 
brain size in humans only, the results have no 
clear significance for debates about the evolu- 
tion of intelligence in other species. Never- 
theless, the methodological implications of 
this work are enormous. This type of general 
framework to investigate and predict values 
for constellations of co-evolving variables, 
not only in adults but also throughout life, 
would allow for more-detailed tests of more- 
nuanced predictions, regardless of the species 
of interest. m 
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RESEARCH EINES 
FORUM ECONOMICS 


The cost of a 
warming climate 


A study finds that meeting climate-change mitigation targets will lead toa 
substantial reduction in economic damages. Here, economists present opposing 
views on the approach used by studies such as this one. SEE LETTER P.549 


THE TOPIC IN BRIEF 

@ Climate change is already affecting the 
economy through hurricanes, droughts and 
floods. 

@ On page 549, Burke et al.’ report that 
achieving global-warming targets set by the 
United Nations could save trillions of dollars 
in damages. 

@ The study’s methodology follows previous 
literature*? by examining the short-term 


Feeling the heat 
WOLFRAM SCHLENKER 


| ie asaine of the economic impacts of 
climate change are essential for the devel- 
opment of climate policies. Important con- 
cerns have been raised about studies such as 
that of Burke et al., and more research needs 
to be carried out. However, I think that the 
authors of these studies are doing the best 
job possible by basing their estimates on a 
rigorous analysis and clearly stating their 
assumptions. 

The extrapolation of the historical relation- 
ship between temperature and GDP into the 
future raises the question of whether techno- 
logical advances might change the predicted 
trajectory. However, it is worth emphasizing 
that such extrapolation is based on an econo- 
metric model — an economic model based 
on an empirical analysis — that has been 
shown to be remarkably consistent between 
rich and poor countries, as well as between 
the earlier and later part of the sample period 
involved (see, for example, ref. 3). This 
makes it unlikely that adaptation measures 
are already available, because they have not 
been deployed in the past even though hot 
countries would have benefited from them. 

GDP isa useful metric to assess the benefits 
of limiting global warming. It provides a 
measurement of human welfare under the 
assumption that the market prices of goods 
and services fully reflect the costs of their 
production and use’. In reality, this assump- 
tion is not always valid. For instance, fossil- 
fuel prices often do not include the costs 
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effects of weather on the growth rate of 
gross domestic product (GDP) — the market 
value of all goods and services produced ina 
country in a specific time period. 

@ These data are then extrapolated into the 
future to assess the economic impacts of 
climate change. 

@ The validity of this approach has been 
intensely debated in the economics 
community. 


associated with global warming and other 
environmental effects on society. However, 
by focusing only on GDP, huge economic 
impacts from limiting global warming are 
predicted. These estimates would be even 
bigger if the non-market benefits of reduced 
fossil-fuel use — for example, for human 
health and ecosystems — were considered. 

The predicted impacts are larger than those 
obtained in earlier work’. The main reason is 
that strong effects of weather on the growth 
rate of GDP are found, whereas the earlier 
work stipulated, but did not empirically test, 
that weather affects only the level of GDP ina 
particular year. Heat and drought, for exam- 
ple, directly influence agricultural yields in 
a given year, but have limited impact in the 
following years. By contrast, a growth effect 
implies that a destructive weather event not 
only decreases GDP in a given year, but also 
lowers the value for all future years. 

The main innovation of studies such as that 
of Burke and colleagues is to use an econo- 
metric model that can incorporate both 
level and growth effects, without favouring 
one type of effect over the other’. This is 
accomplished by including both the current 
temperature and the temperatures in previous 
periods in the analysis. 

If weather affected GDP only in a given 
year, immediate impacts would be offset in the 
future — for example, a 1% decrease in GDP 
would be offset by a 1% increase in GDP the 
following year. On the contrary, the authors 
of these studies find that the impacts are not 
offset, but rather amplified. The one caveat is 
that when temperatures in previous periods 
are included in the analysis, the uncertainties 
in the projected economic damages increase 


substantially. Resolving this issue is a key 
direction for future research. = 
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Grand damage 
projections 
MAXIMILIAN AUFFHAMMER 


if Dacor: the impacts of climate change 
on surface temperature, precipitation and 
sea level into economic damages is challeng- 
ing. One approach is to use longitudinal data 
(repeated measurements of the same variables) 
to estimate damage functions — mathemati- 
cal expressions that translate physical impacts 
into monetary damages’. These functions are 
associated with specific locations and sectors, 
such as agriculture or manufacturing. A major 
drawback is that coverage across locations and 
sectors is incomplete. Studies such as that of 
Burke et al. circumvent this problem by using 
GDP to aggregate economic impacts across 
sectors. Nevertheless, several issues must be 
considered before strong conclusions can be 
drawn from this work. 

First, the authors of these studies argue 
that societal adaptation to climate change is 
accounted for statistically. However, what is 
incorporated stems from a historical cross- 
country comparison of the temperature 
sensitivity of GDP. In reality, future adaptation 
will probably involve innovative technologies 
with lower costs than those that are currently 
used. Such technologies might include, for 
example, air conditioners powered by carbon- 
free electricity that are more energy efficient 
than present-day devices. Adaptation could 
therefore result in lower economic damages 
than are predicted. 

Second, on a larger scale, climate change 
will lead to a planetary redistribution of eco- 
nomic activity, which will result in a redistri- 
bution of international trade flows. Such an 
effect is impossible to quantify credibly and 
could have a large impact on the projected 
damages. 

Third, GDP includes only goods and 
services that are transacted in markets and 
therefore have measurable prices. It does not 
capture the effects of climate change on valu- 
able non-market sectors, such as ecosystem 
services and biodiversity. 

Finally, allowing climate change to influence 
both the level and the growth rate of GDP is 
shown to lead to a wide distribution of pro- 
jected impacts. Asa result, neither small (or 
zero) effects nor massive effects can be ruled 
out. Attempts to distinguish between these two 
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possibilities using simple statistical models at 
the global level have been inconclusive. 
There are good reasons for thinking 
that some effects of climate change might 
be cumulative. For instance, climate and 
weather will affect the level, and potentially 
the growth rate and efficiency, of capital and 
labour. Furthermore, climate might induce 
technological change through both adapta- 
tion and mitigation measures. Pinning down 


NEURODEGENERATION 


these macroeconomic processes to resolve just 
how large the effects of climate will be on the 
long-term growth of GDP needs to be a high 
priority for future work. m 
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Sabotage by the brain’s 
supporting cells 


Several neurodegenerative disorders are linked to the build-up of abnormal 
o-synuclein protein in distinct cell types. It emerges that differing intracellular 
factors dictate the properties of this protein in each cell type. SEE LETTER P.558 


LARY C. WALKER 


ne of the many mysteries surrounding 

neurodegenerative diseases is how 

they can manifest in such a variety of 
ways. Disorders such as Alzheimer’s disease, 
Parkinson's disease and amyotrophic lateral 
sclerosis (also known as motor neuron disease) 
are each defined by a core set of nervous-sys- 
tem abnormalities, but every affected person's 
brain responds slightly differently. Moreover, 
although each of these disorders is associated 
with the abnormal accumulation of 
a different protein in or around cells, 
some protein aggregates can give rise 
to more than one neurodegenerative 
disease. How can this happen? On 
page 558, Peng et al.' present persua- 
sive evidence that different types of 
cell accumulate structurally distinct 
forms of one protein, a-synuclein. 
By shaping the 3D architecture of 
the corrupted protein, the celltype 
helps to determine the nature of the ~~ 
resulting disease. } 


om 

Most normal proteins foldinto 4 j 

characteristic conformations that : ww 
a 


are strongly governed by the pro- © 
tein’s amino-acid sequence. But in 
age-related neurodegenerative con- 
ditions, certain proteins misfold, and 


protein forms intraneuronal aggregates called 
Lewy bodies and Lewy neurites — for instance 
in Parkinson's disease and a condition known as 
dementia with Lewy bodies, which are collec- 
tively referred to as Lewy body diseases (LBDs). 
In a more-aggressive brain disorder called 
multiple system atrophy (MSA), misfolded 
a-synuclein accumulates mostly in neuron- 
supporting cells called oligodendrocytes’, in 
clumps known as glial cytoplasmic inclusions. 

Why a-synuclein aggregates are mainly 
found in different cell types in MSA and LBDs 


body \ 
—_ =—_ 
Por 
a ; FI 
ee 
Neuron 


Neuronal insulation 
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MSA strain 
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has been uncertain. It cannot be attributed to 
differences in amino-acid sequence, because 
a-synuclein is not typically mutated in the 
common form of either condition**. How- 
ever, previous work’ has shown that aberrant 
a-synuclein in LBDs is structurally and func- 
tionally different from that in MSA. These 
variant molecular states are known as protein 
strains’. When injected into the brains of sus- 
ceptible mice, the MSA strain causes a fatal 
disease similar to human MSA. By contrast, 
injecting the LBD strain fails to induce major 
signs of disease in this model’. 

Peng et al. set out to investigate the causes 
behind this difference in a-synuclein potency. 
The authors first confirmed that protein aggre- 
gates in the oligodendrocytes of people with 
MSA are conformationally distinct from those 
in neurons from people who have LBDs. In 
MSA, a few neurons do harbour a-synuclein 
aggregates, but the researchers found that 
these aggregates display the LBD conforma- 
tion — thus, the two strains can occupy the 
same brain, albeit in different cell types. Next, 
the team exposed cultured cells to each strain, 
and found that MSA-derived a-synuclein is 
approximately 1,000 times more 
potent at inducing aggregation than 
is the LBD-derived protein. 

The authors then injected the 
two types of aggregated a-synuclein 
(called seeds) into the brains of wild- 
type mice. This in vivo experiment 
confirmed that MSA-derived seeds 
are much more effective than seeds 
derived from LBDs at seeding aggre- 
gation. However, the seeds instigated 
aggregation only in neurons, not in 
oligodendrocytes. 

Why might this be the case? 
Oligodendrocytes normally pro- 


—.  ducelittle, if any, a-synuclein’. The 


authors therefore genetically engi- 
neered mice to express a-synuclein 
only in oligodendrocytes. They 


induce other proteins of the same 
type also to misfold and to stick to one 
another. In this way, the abnormal 
molecular structure propagates by 
means ofa crystallization-like process 
called seeded protein aggregation’. 
One such protein is a-synuclein. 
Under normal circumstances, 
a-synuclein is located mainly in nerve 
terminals. But in some cases, the 


Figure 1 | Distinct strains of a-synuclein protein. In Parkinson's 
disease and dementia with Lewy bodies (collectively referred to as Lewy 
body diseases; LBDs), a misfolded form of a-synuclein called the LBD 
strain aggregates mainly in neurons to form anomalous structures 
called Lewy bodies and Lewy neurites (not shown). But in a disease 
known as multiple system atrophy (MSA), a different strain of misfolded 
a-synuclein forms aggregates called glial cytoplasmic inclusions (GCIs) 
in oligodendrocytes — non-neuronal cells that normally produce 

the fatty insulation for neuronal projections. Peng et al.' show that 
differences in the intracellular environments of the two cell types are 
responsible for the formation of the two strains. 


found that a-synuclein aggregation 
could be induced in oligodendro- 
cytes in these mice using seeds from 
either the MSA or the LBD strain — 
but again, the MSA strain was much 
the more potent. Importantly, the 
aggregates that emerged were always 
the MSA strain, regardless of the type 
of seed injected. Finally, when Peng et 
al. exposed synthetic, unaggregated 
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a-synuclein to cellular material that had been 
isolated from ruptured oligodendrocytes or 
neurons, the oligodendrocytic homogenate 
caused the protein to aggregate into a strain of 
greater seeding potency than did the neuronal 
homogenate. Some as yet unidentified factors 
in oligodendrocytes, then, seem to drive the 
formation of the MSA strain. 

Collectively, the authors’ experiments show 
that oligodendrocytes produce a structural 
variant of a-synuclein that robustly seeds 
aggregation in both oligodendrocytes and 
neurons. However, the cell type strongly 
influences the molecular strain of the aggre- 
gates that forms: oligodendrocytes specifically 
generate the MSA strain, whereas neurons 
preferentially produce the LBD strain (Fig. 1). 
The authors conclude that cell-type-specific 
factors regulate the molecular architecture 
and distinctive harmful properties of aberrant 
a-synuclein aggregates. They further suggest 
that the robust seeding capacity of the MSA 
strain contributes to the aggressive clinical 
progression of MSA. 

The recognition that different cells generate 
different disease-causing protein strains raises 
a wealth of questions. An especially perplex- 
ing problem is how such large amounts of 
a-synuclein end up in oligodendrocytes in 
MSA, given that these cells do not produce 
much of the protein’. Possible mechanisms 
include augmented oligodendrocytic expres- 
sion of a-synuclein in the disease state, or the 
uptake of a-synuclein that has been released 
from neurons”°. 

MSA can affect different brain systems in 
people with the condition’. Peng and col- 
leagues’ findings demonstrate the possible 
role of cell composition in these differences, 
but what other factors might be involved? Per- 
haps the metabolism of a-synuclein in neurons 
and oligodendrocytes varies between brain 
regions. Or maybe the regional vulnerability 
to a-synuclein deposition is dictated mainly by 
where seeding first occurs. Two other types of 
neuron-supporting cell, astrocytes and micro- 
glia, are also involved in MSA, but their roles 
in the disease remain to be defined”". Finally, 
the role of small, toxic a-synuclein assemblages 
called oligomers", which might act in differ- 
ent brain regions from the larger aggregates, 
needs further exploration. Clarification of 
these issues could improve our understand- 
ing not just of a-synuclein diseases, but also of 
other neurodegenerative disorders that involve 
protein aggregation. 

In sum, the authors’ work highlights the 
interplay between a cell’s composition and pro- 
teins that can cause disease. Identification of 
the cell-specific factors that impel a-synuclein 
to aggregate into the MSA strain could reveal 
ways to treat this debilitating and ultimately 
fatal brain disorder. Peng and colleagues’ 
findings are also a salutary reminder of the 
salient part played by non-neuronal cells in 
both the health and failure of the nervous 
system. & 
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Subtype switch foils 
pancreatic tumours 


Mutations in the gene KDMG6A drive an aggressive subtype of pancreatic cancer 
by causing repositioning of an enzyme complex that modifies histone proteins 
associated with DNA, leading to altered gene expression. 


FIEKE FROELING & DAVID TUVESON 


ancreatic ductal adenocarcinoma 
Pp (PDAC) is one of the most deadly 

cancers in Western society’ because 
it tends to be diagnosed late and responds 
poorly to therapy. PDAC tumours fall into 
two main groups’ *:a ‘classical’ subtype, anda 
more aggressive ‘squamous subtype in which 
pancreatic cells fail to undergo proper differen- 
tiation. The squamous subtype often involves 
mutations in members of the COMPASS-like 
complex — a group of methyltransferase and 
demethylase enzymes that, respectively, add 
or remove methyl groups from lysine amino- 
acid residues on histone proteins, around 
which DNA is packaged. Such histone modi- 
fication can lead to changes in the expres- 
sion of histone-associated genes involved in 
pancreatic-cell differentiation. Writing in 
Cancer Cell, Andricovich et al.° demonstrate 
the role of mutations in one member of this 
complex, KDM6A, in driving the squamous 
PDAC subtype. 

The KDMG6A gene is found on the X chro- 
mosome, and, in males, the presence of a 
KDM6A mutation can co-occur with mutation 
of a related gene on the Y chromosome, UTY. 
Andricovich et al. found that KDM6A and UTY 
mutations were associated with the squamous 
subtype of PDAC, and with shortened length of 
patient survival. They then used mouse mod- 
els to confirm that functional KDM6A acts 
to suppress the development of PDAC. Mice 
harbouring Kdméa gene mutations developed 
aggressive, poorly differentiated squamous 
tumours that showed protein- and gene- 
expression patterns characteristic of human 


tumours of the squamous subtype’. These 
defects were more pronounced in females than 
in males, consistent with the fact that females 
carry two copies of Kdméa in their cells and 
males have one copy of Kdméa on the X chro- 
mosome and Uty on the Y. 

KDM6A is a demethylase that removes a 
methyl modification dubbed H3K27me3 from 
lysine residue 27 on histone H3 proteins. But 
Andricovich and colleagues found that only a 
small percentage of H3K27me3 marks were 
altered in cells from Kdm6a-mutant mice, 
compared with controls. This observation 
led the authors to posit that altered KDM6A 
demethylase activity was not the driver of 
Kdmé6a-mutant PDAC. Instead, they found 
that loss of Kdm6a resulted in changes in other 
histone modifications, specifically at groups 
of gene-regulatory DNA sequences called 
super-enhancers, whose activation promotes 
the expression of certain genes that are highly 
expressed in PDAC. 

In particular, the researchers observed 
changes in the distribution across super- 
enhancers of a different methyl modifica- 
tion (dubbed H3K4mel1) and a modification 
involving acetyl groups at lysine 27 of his- 
tone H3 (H3K27ac). Such changes were associ- 
ated with a repositioning of the COMPASS-like 
complex to these regions. The authors found 
that the altered histone modifications led to 
activation of some super-enhancers, and, in 
some cases, to an increased reach — an abil- 
ity to regulate distant genes that the super- 
enhancer cannot normally influence. These 
findings indicate that KDM6A exerts its 
tumour-suppressive role not only through 
its demethylase activity, but also by altering 
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Figure 1 | KDM6A protein in pancreatic cancer. a, Pancreatic ductal adenocarcinomas (PDACs) can 

be catagorized into two main subtypes: classical tumours and more-aggressive, squamous tumours. 
Andricovich et al.’ have provided evidence from mice and humans that mutations in the gene KDM6A 
cause changes in the patterns of molecular modifications to histone proteins, around which DNA is 
packaged. This histone remodelling leads to the expression of genes associated with squamous PDAC. 
However, the authors show that treatment with a small molecule called JQ1 prevents this subtype switch. 

b, This finding adds to the list of PDAC subgroups that can be targeted with drug treatments. Most PDAC 
tumours involve mutations in the gene KRAS, which cannot be targeted by drugs, but some KRAS wild- 
type tumours lack this mutation and carry other mutations that can be targeted. In addition, two subgroups 
of KRAS-mutant tumours carry defects in DNA-repair pathways, which can be targeted by different drugs. 


the position of the COMPASS-like complex, 
enabling other enzymes to modify histones. 
The authors also showed that the increase in 
the reach and activation of super-enhancers 
led to the activation of genes involved in 
squamous-subtype-like differentiation. 
Because Kdm6a-mutant PDAC in mice was 
not associated with significant H3K27me3 
demethylation, the authors hypothesized 
that the alternative functions of the aberrant 
COMPASS-like complex promoted PDAC, 
and might therefore be vulnerable to drug 
treatment. This hypothesis is supported by 
the fact that mutant UTY, which helps to drive 
PDAC in males, lacks demethylase activ- 
ity. Andricovich et al. therefore analysed the 
ability of various drugs that target other his- 
tone modifications to prevent the growth of 
KDM6A-deficient human cancer cells in vitro. 
The authors found that cells harbouring 
mutations in KDM6A or other genes of the 
COMPASS-like complex were highly sensitive 
to inhibitors of BET-family proteins. These 
proteins bind to histone lysine residues that 
have been modified by acetyl groups, and 
recruit the cell’s transcriptional machinery to 
promote gene expression. Various studies® have 
shown that BET inhibitors can displace the BET 
protein BRD4 from acetylated lysines at super- 
enhancer regions, thereby reducing the expres- 
sion of cancer-causing genes (oncogenes) such 
as MYC. Because KDM6A mutations lead to 
altered lysine acetylation at super-enhancers, 
it makes sense that these drugs could be effec- 
tive in this setting. Indeed, Andricovich et al. 
showed that the BET inhibitor JQ1 decreased 
BRD4 binding to the super-enhancers that 
regulate MYC and other oncogenes, and so 
decreased the expression of these genes. 
Finally, the authors demonstrated that this 
drug treatment was also effective in vivo. The 
tumours of Kdm6a-deficient mice treated with 
JQ1 were smaller than those of mice that did not 
receive the drug, and had well-differentiated 


features typical of the classical PDAC subtype. 
This indicates that BET inhibitors have the 
potential to reverse the effects of the histone- 
modification remodelling that occurs in the 
squamous subtype (Fig. 1a). Targeting histone 
modifications and altered gene-regulatory 
networks to cause a ‘class switch to a more dif- 
ferentiated, less aggressive subtype of cancer 
might provide a promising therapeutic strategy. 
In support of the idea that modulating these 
factors can alter cancer progression, other stud- 
ies have shown that enhancer reprogramming 
and large-scale losses of DNA methylation play 
a part in the spread of cancer’®. 

Our increased understanding of the molec- 
ular underpinnings of cancer has hugely 
improved treatments for many tumours, 
although in PDAC the relative lack of obvious 
drug targets has presented a challenge. There 
are some cases of PDAC that involve onco- 
genes for which inhibitors do exist’. However, 
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most PDAC tumours harbour oncogenic 
mutations in the gene KRAS, for which inhibi- 
tors are not available. But there are two clear 
groups of people with KRAS-mutant PDAC 
tumours characterized by deficiencies in spe- 
cific DNA-repair pathways that can be targeted 
by drugs’ (Fig. 1b). Patients harbouring 
KDM6A mutations (and possibly other muta- 
tions in genes of the COMPASS-like complex) 
might represent another subgroup, who would 
benefit from therapies targeting BET function. 
Moreover, BET inhibitors could have broader 
activity if combined with other inhibitors of 
histone remodelling, as previously reported”. 

It is to be hoped that more molecular bio- 
markers will soon be discovered that, like 
KDMO6A mutations, can predict tumour 
responsiveness to a particular therapy. This 
research avenue provides cause for optimism 
that improved outcomes for people with pan- 
creatic cancer will be the norm — and not the 
exception — in the near future. m 
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Plasmon propagation 
pushed to the limit 


Excitations called plasmons have the potential to miniaturize photonic devices, 
but are often short-lived. Microscopy reveals that plasmons in the material 
graphene can overcome this limitation at low temperatures. SEE LETTER P.530 


JUSTIN C. W. SONG 


nanoscale using collective oscillations of 
electrons known as plasmons. But just as 
death and taxes are the only certainties in life, 
energy loss is the only certainty in plasmonics. 


L= can be confined and steered at the 


The tighter the confinement of light, the 
shorter the lifetime of the plasmons’ — a trade- 
off that is a major hurdle in the practical use of 
these oscillations. On page 530, Ni et al.” use 
a technique called scanning near-field opti- 
cal microscopy to study plasmons in a single 
layer of carbon atoms known as graphene, 
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at cryogenic temperatures 
(60 kelvin). The authors show 
that the plasmons can produce 
extremely compact light con- 
finement while retaining long 
lifetimes. They use their results 
to determine the fundamental 
limits of plasmon propagation 
in graphene. 

The propagation of light 
involves the oscillation of elec- 
tric and magnetic fields. This 
oscillation defines the relation- 
ship between the frequency 
and wavelength of light, and 
underpins the diffraction limit 
— the fact that, in free space, 
light spreads out if it passes 
through a region narrower 
than its wavelength. When 
light interacts with plasmons, 
its speed can be substantially 
reduced, which allows it to be 
confined to distances much 
smaller than its free-space 
wavelength. As a result, plas- 
mons have become a versatile 
tool for controlling the behaviour of light at 
the nanoscale*. However, the same light-plas- 
mon interaction that can confine light below 
its wavelength also enables energy to be lost 
through the scattering of electrons. 

Noble metals such as silver and gold are 
conventionally used in plasmonics, but suf- 
fer from high losses. In the past few years, 2D 
materials have become promising alterna- 
tives*. In the case of graphene, plasmons can 
compress light to distances as small as one- 
three-hundredth of the light’s free-space wave- 
length’. Furthermore, the electron density of 
graphene can be readily controlled, which 
provides direct electrical means of tuning 
the properties of the plasmons. But although 
sustained efforts to improve the quality of 
graphene have yielded steady advances’, plas- 
mon loss remains substantial. 

Ina bid to push the limits of plasmon propa- 
gation, Ni and colleagues launched and imaged 
plasmons in a device containing high-quality 
graphene at cryogenic temperatures. The use 
of these temperatures minimized losses caused 
by temperature-sensitive processes, such as 
the scattering of electrons from mechanical 
vibrations called phonons. The authors cus- 
tomized an instrument known as a scanning 
near-field optical microscope so that it could 
operate at cryogenic temperatures. Although 
these instruments are routinely used to study 
plasmons at room temperature, operating 
them at lower temperatures has been difficult. 

Ni et al. used the narrow metallic tip of 
the microscope to launch plasmons in the 
graphene device. They then scanned the tip 
across the device to image the interference 
pattern produced by the plasmons as they 
reflected from the edges of the device and 
from microstructures present on the device's 
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Figure 1 | Low-temperature plasmons investigated in graphene. Ni et al.’ used 

an instrument known as a scanning near-field optical microscope to study exotic 
excitations called plasmons. They used the narrow metallic tip of the microscope 

to launch plasmons in a device containing the material graphene, at cryogenic 
temperatures (60 kelvin). The authors then scanned the tip across the device to 

image the interference pattern produced by the plasmons as they were reflected 
(white arrows) from the edges of the device and from microstructures present on the 
device's surface. The interference pattern consisted of bright and dark bands that were 
found throughout the device, demonstrating that the plasmons could travel several 
micrometres before their energy was lost. Such long-lived plasmons could have many 
applications. Scale bar, 1 um. (Adapted from Fig. 1c of ref. 2.) 


surface (Fig. 1). This technique is particu- 
larly useful because it launches plasmons in 
the interior of the device, which limits losses 
caused by interaction with the device's edges. 
Such losses can be large in other approaches’. 

The fruits of Ni and colleagues’ labour are 
pronounced plasmon interference fringes 
(bright and dark bands) that are found 
throughout the device and that extend sev- 
eral micrometres from any boundaries. These 
fringes make the entire device ‘light up’ with 
a characteristic washboard-like pattern. The 
plasmons simultaneously have relatively long 
lifetimes (reaching 1.6 picoseconds; 1 ps is 
10°” seconds) and confine light to distances 
smaller than one-sixtieth of the free-space 
wavelength. Their quality factor, a measure of 
energy retention, is 130, which is a record for 
plasmons that enable such compact light con- 
finement. The performance of the plasmons 
therefore bucks the trade-off between tight 
confinement and high loss. It is possible 
thanks to the extremely high quality of the 
authors’ graphene device, which contains 
highly mobile electrons that can travel several 
micrometres without scattering. 

Remarkably, using a combination of detailed 
modelling and systematically collected 
temperature-dependent data, Ni and col- 
leagues determined that the main cause of 
energy loss at low temperatures was not elec- 
tron scattering in the graphene. Instead, plas- 
mon loss arose mostly from insulating material 
that surrounded the graphene. The quality of 
the plasmons could therefore be improved by 
reducing these extrinsic losses, for example by 
altering this insulating material. The authors 
also suggest that the intrinsic (fundamental) 
limits of plasmon propagation at cryogenic 
temperatures have not yet been reached. 


Microscope tip 


Graphene 
device 


They calculate that it might 
be possible to achieve quality 
factors more than seven times 
higher than the one reported 
in the current paper. 

Nevertheless, the excep- 
tional quality of Ni and col- 
leagues’ graphene plasmons 
sets a new standard for nano- 
photonic platforms. Tightly 
confined light in such plas- 
mons can now be thought of 
as being highly stable, with 
the ability to be directed and 
steered across distances of 
several micrometres. The 
possibilities for the future are 
vast and range from the fun- 
damental (such as probing the 
topological® and geometrical 
structure’ of plasmons) to the 
applied (including nanoscale 
plasmon lasers’, sensitive light 
detectors, sub-wavelength 
routing of light, and nanoscale 
optical interconnects’). The 
authors’ high-quality graphene 
plasmons, combined with recently developed 
techniques to substantially reduce the overall 
size of plasmons"’, make a compelling case for 
graphene-based nanophotonics. 

Perhaps most exciting, however, is the 
prospect of using scanning near-field optical 
microscopy at cryogenic temperatures to probe 
excitations other than plasmons. Phases of 
matter such as superconductors, ferromagnets 
and antiferromagnets possess excitations that 
could be accessed using this technique”. In the 
past few years, a wide range of these phases has 
been discovered on 2D materials, on which the 
surfaces are fully exposed and are therefore eas- 
ily accessible. Such phases manifest only at low 
temperatures, making cryogenic operation the 
key to launching the excitations and studying 
their intricate dynamics. m 
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Mutant phenotypes for thousands of 
bacterial genes of unknown function 


Morgan N. Price!, Kelly M. Wetmore!, R. Jordan Waters’, Mark Callaghan!, Jayashree Ray', Hualan Liu', Jennifer V. Kuehl’, 
Ryan A. Melnyk!, Jacob S. Lamson!, Yumi Suh!, Hans K. Carlson!, Zuelma Esquivel!, Harini Sadeeshkumar', Romy Chakraborty’, 
Grant M. Zane‘, Benjamin E. Rubin®, Judy D. Wall*, Axel Visel?°, James Bristow’, Matthew J. Blow?*, Adam P. Arkin!7* 


& Adam M. Deutschbauer!®* 


One-third of all protein-coding genes from bacterial genomes cannot be annotated with a function. Here, to investigate 
the functions of these genes, we present genome-wide mutant fitness data from 32 diverse bacteria across dozens of 
growth conditions. We identified mutant phenotypes for 11,779 protein-coding genes that had not been annotated with 
a specific function. Many genes could be associated with a specific condition because the gene affected fitness only in 
that condition, or with another gene in the same bacterium because they had similar mutant phenotypes. Of the poorly 
annotated genes, 2,316 had associations that have high confidence because they are conserved in other bacteria. By 
combining these conserved associations with comparative genomics, we identified putative DNA repair proteins; in 
addition, we propose specific functions for poorly annotated enzymes and transporters and for uncharacterized protein 
families. Our study demonstrates the scalability of microbial genetics and its utility for improving gene annotations. 


Thousands of bacterial genomes have been sequenced, revealing the 
predicted amino acid sequences of millions of proteins. Only a small 
proportion of these proteins have been studied experimentally and the 
functions of most proteins have been predicted from their similarity 
to experimentally characterized proteins. However, about one-third of 
bacterial proteins are not similar enough to any characterized protein 
to be annotated by this approach'. Furthermore, these predictions are 
often incorrect, as homologous proteins may have different substrate 
specificities”. This sequence-to-function gap represents a growing 
challenge for microbiology, because new bacterial genomes are being 
sequenced at an ever-increasing rate, while experimental protein 
characterization continues to be relatively slow’. 

One approach for investigating the function of an unknown protein 
is to assess the consequences of a loss-of-function mutation of the 
corresponding gene under multiple conditions**. The mutant 
phenotypes can be combined with comparative genomics to provide 
evidence-based annotations for a fraction of the proteins**. Transposon 
mutagenesis followed by sequencing (TnSeq) measures mutant pheno- 
types genome-wide from a single experiment in which tens of thou- 
sands of different mutants are grown together”®. Coupling TnSeq with 
random DNA barcoding of each mutant (RB-TnSeq) makes it easier to 
measure phenotypes across many conditions’. Here, we use RB-TnSeq 
to address the sequence-to-function gap by systematically exploring 
the mutant phenotypes of thousands of genes from each of 32 bacteria 
under multiple experimental conditions (Fig. 1a). 


Mutant fitness compendia for 32 bacteria 

To perform a systematic assessment of mutant phenotypes across a 
diverse set of bacteria, we studied 32 genetically tractable bacteria 
representing six different bacterial divisions and 23 different genera. In 
addition to 30 aerobic heterotrophs, we also studied a strictly anaerobic 
sulfate-reducing bacterium (Desulfovibrio vulgaris) and a strictly photo- 
synthetic cyanobacterium (Synechococcus elongatus) (Fig. 1b). We 


generated a randomly barcoded transposon mutant library in each 
of the 32 bacteria, ten of which have been previously described? . 
For each mutant population, we used TnSeq to generate genome-wide 
maps of transposon insertion locations. Genes that have very few or no 
transposon insertions are likely to be essential for viability, or nearly so, 
in the conditions that were used to select the mutants (Supplementary 
Note 1). We identified 289-614 genes per bacterium that are likely to 
encode essential proteins (Fig. 1b; Supplementary Table 1). 

To identify conditions for mutant fitness profiling, we tested the 
growth of the 30 aerobic heterotrophic bacteria in a range of condi- 
tions, including the utilization of 94 different carbon sources and 45 
different nitrogen sources, and their inhibition by 34-55 stress com- 
pounds including antibiotics and metals (Supplementary Tables 2-4). 
In the typical mutant fitness experiment, we grew a pool of mutants 
for 4-8 generations and used DNA barcode sequencing’? to compare 
the abundance of the mutants before and after growth (Fig. la). We 
defined gene fitness to be the log, change in abundance of mutants in 
that gene during the experiment (Fig. 1a). For example, a gene fitness 
value of —2 means that the strains with transposon insertions in that 
gene dropped to 25% of their initial relative abundance by the end of the 
experiment, whereas a fitness value of 0 means that their relative abun- 
dance was unchanged. Genes with insufficient coverage were excluded 
from our data set (‘no data’ in Fig. 1b) and only protein-coding genes 
were considered. Including all replicates, we conducted a total of 4,870 
genome-wide fitness experiments that met our criteria for biological 
and internal consistency”, representing 26-129 experimental condi- 
tions for each bacterium (Fig. 1b, Supplementary Table 5). 

To illustrate the consistency of our data with known protein func- 
tions, we examined fitness data for the three most common classes of 
experiments: carbon utilization, nitrogen utilization, and stress. For 
the utilization of D-fructose or 4-hydroxybenzoate as the sole source 
of carbon in Cupriavidus basilensis, the fitness data identified expected 
proteins for the catabolism of each substrate (Fig. 1c). Similarly, the 
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Fig. 1 | High-throughput genetics for 32 bacteria. a, Our approach 
for measuring gene fitness. b, For each bacterium, we show the types of 
conditions that we studied and how many genes had statistically significant 
mutant phenotypes (using a t-like test statistic? and FDR < 0.05). c, Gene 
fitness during the utilization of two carbon sources by C. basilensis. 


fitness data identified the key enzymes and transporters required for 
the utilization of p-alanine or cytosine as the sole nitrogen source in 
Azospirillum brasilense (Extended Data Fig. 1a). Finally, in Shewanella 
loihica, orthologues of the CzcCBA heavy metal efflux pump"4 and 
the zinc-responsive regulator ZntR were important for fitness in the 
presence of an elevated concentration of zinc (Extended Data Fig. 1b). 
For each condition, we also identified proteins that were previously 
not known to be involved in the respective processes, including an 
efflux pump that is important for 4-hydroxybenzoate utilization by 
C. basilensis (Fig. 1c, Supplementary Table 6). 

We computed a t-like test statistic for each gene fitness value’ and 
identified a statistically significant mutant phenotype in at least one 
condition for 30% of the genes for which we collected fitness data 
(Fig. 1b). Eighteen per cent of all genes with fitness measurements 
were significantly detrimental to fitness (fitness > 0) in at least one 
condition (Extended Data Fig. 2), which is consistent with previous 
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See Supplementary Table 6 for details on the highlighted genes. The 
4-hydroxybenzoate data are the average from two biological replicates. 
d, We classified the genes from all 32 bacteria by how informative their 
annotations were, and for each class, we show how many genes have each 
type of phenotype. 


reports that many genes are detrimental in some conditions*. Genes 
annotated with a TIGRFAM!* functional role (TIGR role) were 
particularly likely to have statistically significant phenotypes, with 
more than half of those with fitness data (52%) showing a significant 
phenotype (Fig. 1d). By contrast, genes with vague annotations that are 
not specific (such as ‘transporter’) or with functionally uninformative 
annotations (such as ‘hypothetical protein’ or ‘membrane protein’) were 
less likely to have phenotypes (28% or 20%, respectively). Nevertheless, 
our assays identified phenotypes for 11,779 genes that are not annotated 
with a detailed function (Fig. 1d), including 4,135 genes that encode 
proteins that do not belong to any characterized family in either Pfam 
or TIGRFAMs'®!”, 


Conserved functional associations 
To gain insight into the biological functions of individual proteins from 
the mutant fitness data, we used two strategies: (1) identification of 
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Fig. 2 | Identification of conserved phenotypes. a, An example of a 
conserved and specific phenotype. Each point shows the fitness of crcB in 
an experiment, with fluoride stress experiments highlighted in red. Values 
less than —4 are shown at —4. The y-axis is random. b, The fraction of 
E. coli genes with a specific phenotype in defined media whose encoded 
proteins are directly involved in the uptake or catabolism of the compound 
or regulation thereof. For this analysis, we examined all 61 E. coli genes 
with conserved specific-important phenotypes and a random sample of 
40 of the other genes with specific-important phenotypes. The confidence 
interval is from the binomial test. c, The number of proteins of each 
type that had a conserved specific phenotype or a specific phenotype. 


‘specific phenotypes that are observed only under one or a small num- 
ber of conditions in a given bacterium; and (2) ‘cofitness’ patterns, 
where multiple genes in a bacterium show similar fitness profiles 
across all conditions. Furthermore, we identified conserved specific 
phenotypes and conserved cofitness by comparing the data from 32 
bacteria, and we tested the reliability of these conserved associations 
for understanding protein function. 

To assign specific phenotypes, we identified genes that had 
|fitness| > 1 and |t| >5 in an experiment but had little phenotype in 
most other experiments. For example, the fluoride efflux protein CrcB'® 
is specifically important for fitness under elevated fluoride stress in 
eight bacteria, but not for fitness in any of the hundreds of other experi- 
mental conditions that we tested (Fig. 2a shows data from five bacteria). 
Among all genes with a significant phenotype under any condition, 
33% had a specific phenotype. We considered a specific phenotype 
to be conserved if a similar protein in another bacterium (a putative 
orthologue) had a similar phenotype. 

As catabolic processes in Escherichia coli are well understood, we 
compared the specific-important phenotypes (fitness <0) on carbon 
and nitrogen sources to the EcoCyc database!®. We found that many 
of these gene-condition associations resulted from the gene's direct 
involvement in the uptake or catabolism of the compound or in the 
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d, Comparison of fitness values for ptsP and npr from E. coli (n= 162 
independent experiments) and S. oneidensis (n = 176 independent 
experiments). The experiments are colour-coded by type. r is the linear 
correlation. e, Using TIGR subroles to test the accuracy of gene-gene 
associations. For each query gene with a TIGR subrole, we asked how 
often the most cofit gene with a TIGR subrole had the same subrole as the 
query gene, across varying levels of cofitness or conserved cofitness for 
that most cofit gene. The confidence interval is from the binomial test. 

f, The number of proteins of each type that had at least one association 
from conserved cofitness (r > 0.6 in both bacteria) or else from cofitness 
(r>0.8). 


regulation of those processes (Supplementary Table 7). If the specific 
phenotype was conserved, then the association was much more likely 
to be direct (Fig. 2b; P< 10~4, Fisher’s exact test). 

We identified specific phenotypes and conserved specific phenotypes 
for genes of all annotation classes (Fig. 2c). In particular, specific pheno- 
types linked 3,927 genes with vague or hypothetical annotations to 
over 100 conditions, including 82 carbon sources, 43 nitrogen sources, 
and 54 stresses. 

Our second strategy for gaining insight into a protein’s function 
was based on the observation that genes with related functions often 
have similar fitness patterns across multiple conditions, which we 
term cofitness. Cofitness is the Pearson (linear) correlation of all 
of the fitness values for a pair of genes in the same bacterium, and 
cofitness across dozens of conditions has already been shown to be 
useful for understanding protein function*~>”°. For example, Npr and 
PtsP of the nitrogen phosphotransferase system (PTS) in E. coli were 
cofit (Fig. 2d), probably because PtsP phosphorylates and activates 
Npr’!. We defined conserved cofitness as a pair of orthologous genes 
that had high cofitness in more than one bacterium, regardless of the 
conditions. For example, the orthologues of Npr and PtsP (SO1332 
and $O3965) also have high cofitness in Shewanella oneidensis, but 
they have phenotypes in different conditions than in E. coli (Fig. 2d). 
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Phenotypic variability among orthologous genes has been reported 
before”’. 

To test how accurately cofitness or conserved cofitness linked 
together functionally related genes, we determined for each query 
gene whether its functional role (TIGR subrole’®) could be accurately 
predicted by the roles of its cofit genes. We found that high-scoring 
cofitness in a single bacterium led to predictions of TIGR subroles that 
were mostly correct, but the accuracy decayed rapidly as the cofitness 
score decreased (Fig. 2e). By contrast, for conserved cofitness, the 
decay was much slower (Fig. 2e). Furthermore, conserved cofitness 
was significantly more accurate for a given number of predictions: for 
example, the top 2,000 predictions from cofitness (r > 0.86 for gene 
pairs from one bacterium) had 63% agreement in TIGR subroles, while 


the top 2,000 predictions from conserved cofitness (r > 0.66 for gene 
pairs from both bacteria) had 74% agreement (P< 107 1 Fisher’s exact 
test). Conserved cofitness may be more predictive because it filters out 
cases of spurious cofitness between functionally unrelated genes in one 
bacterium. Using thresholds of r> 0.8 for cofitness or r>0.6 for con- 
served cofitness, we identified at least one association from cofitness 
or conserved cofitness for 15% of the genes with fitness data and for 
44% of genes with statistically significant phenotypes. We identified 
cofitness associations for all types of proteins (Fig. 2f), including for 
4,773 vaguely annotated or hypothetical proteins. 

Overall, we identified a functional association (either a specific 
phenotype or high cofitness) for 25,276 genes, of which 13,192 
(52%) had conserved functional associations. Among the genes with 
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Fig. 3 | Genetic overviews for a condition or a class of proteins. 

a, Overview of conserved specific phenotypes in cisplatin stress across 28 
bacteria. The data are the average for all successful cisplatin experiments 
for each bacterium, at up to five different concentrations. Each row 

shows an orthologue group formed by greedy clustering of orthologues 
(bidirectional best BLAST hits with 80% coverage). Some families are split 
into multiple orthologue groups and are marked with square brackets 
(dinG, uvrA). Some genes have more pleiotropic phenotypes in some 

(but not all) bacteria. b, Overview of specific phenotypes for the utilization 
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of p-xylose as a carbon source in 12 bacteria. Putative orthologues are 
included in the heatmap even if they are not important for D-xylose 
utilization. In the oxidative pathway, gene names for xyIBCDX are from 
Stephens et al.*? and should not be confused with E. coli xylB, which is 
not related. The data are the average of 1-3 replicate experiments for each 
bacterium. The colour scale is the same as in a. c, Summary of annotation 
improvements for ABC transporter proteins based on an analysis of 
specific phenotypes. 
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conserved associations, 10,699 (81%) had conserved associations across 
genera and 7,811 (59%) had conservation across divisions. For 2,316 
genes with hypothetical or otherwise vague annotations, we identified 
conserved and hence high-confidence associations (Supplementary 
Table 8). 


Genetic overviews of cellular processes 

Genome-wide mutant fitness profiling of diverse bacteria provides 
a broad genetic overview of each biological condition studied. For 
example, cisplatin reacts with DNA to form crosslinks that block 
DNA replication, so we expected that genes encoding DNA repair 
proteins would be specifically important for growth during cisplatin 
stress’. Indeed, of the 67 protein families that had specific, important, 
and conserved phenotypes on cisplatin, 33 are known to be involved 
in DNA repair, including the UvrABC nucleotide excision repair 
complex (Fig. 3a, Supplementary Table 9). Three of these proteins have 
recently been shown to be involved in DNA repair: RadD (YejH)*’, 
MmcB (DUF1052)*4, and FAN1-like VRR-NUC domain protein” ,and 
individual mutants in these genes were sensitive to cisplatin (Extended 
Data Figs. 3, 4). Seven of the other characterized families that had 
conserved sensitivity to cisplatin are involved in cell division or 
chromosome segregation (Fig. 3a), probably because DNA damage 
can inhibit cell division and lead to filamentous cells*°. Among the 
remaining 27 families, we predicted that eight poorly understood 
families were novel DNA repair families because of their domain 
content or because of regulation by the DNA damage response regulator 
LexA’’~®, including the nuclease EndA (Supplementary Note 2). An 
endA deletion strain is sensitive to cisplatin (Extended Data Fig. 4) and 
the catalytic residue of the nuclease domain of EndA is important for 
cisplatin resistance (Extended Data Fig. 5). 

We obtained similar genetic overviews for many metabolic 
processes. For example, we examined pD-xylose catabolism, which 
we assayed as the sole carbon source in 12 bacteria. We found that 
XylAB was important for xylose utilization in E. coli and in eight other 
bacteria, confirming its central and conserved role (Fig. 3b). By 
contrast, the well-characterized E. coli XylR regulator and XylF 
transporter are not required in each of the other eight bacteria: 
two Pseudomonads use alternative transport proteins for D-xylose, 
whereas Phaeobacter inhibens and Sinorhizobium meliloti require a 
Lacl-like regulator for p-xylose utilization, as previously predicted 
for P. inhibens (PGA1_c13990)?*°. Three bacteria use an oxidative 
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Fig. 4 | Conserved functional associations for genes encoding 
uncharacterized protein families. a, b, Conserved specific phenotypes 
for members of uncharacterized protein families UPF0126 and UPF0060. 
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pathway for p-xylose utilization*~** instead of the XylAB pathway 
(Supplementary Table 10). 


More accurate gene annotations 

Large-scale mutant fitness data can also be used to improve our under- 
standing of proteins that have been annotated with a general biochemical 
function but lack substrate specificity. To illustrate this, we used the 
mutant fitness data to systematically re-annotate the substrate specifici- 
ties of 101 permease subunits of ABC transporters that had strong and 
specific-important phenotypes (fitness < —2) during the utilization of 
diverse carbon or nitrogen sources (Fig. 3c, Supplementary Table 11). 
Using the fitness data, we predicted substrates for all of these proteins 
(Supplementary Note 3). For 24 of the 50 ABC transport proteins that had 
already been annotated with a substrate (48%), the annotation was incor- 
rect or did not include all of the substrates. Our data also provided more 
specific annotations: for example, Ac3H11_2942 and Ac3H11_2943 from 
Acidovorax sp. GW101-3H11 were annotated as transporting ‘various 
polyols, whereas our data show that they are important for using the 
polyol p-sorbitol but not the polyol p-mannitol. Overall, we improved 
the annotations for 75 of 101 transport proteins (Fig. 3c). 

Next we examined transport proteins with specific-important pheno- 
types in carbon or nitrogen source experiments and catabolic enzymes 
with specific-important phenotypes in carbon experiments, and identi- 
fied instances in which the mutant fitness data led to a new annotation. 
In total, we re-annotated 456 proteins that were annotated vaguely or 
incorrectly in either KEGG** or SEED*: 238 transport proteins (this 
includes 68 of the ABC transport proteins described above) and 218 
catabolic proteins (Supplementary Table 12). Of these 456 proteins, 287 
(63%) were not annotated correctly by either KEGG or SEED. Most of 
the re-annotated proteins are homologous to characterized enzymes 
but were too distant for the correct substrate to be identified compu- 
tationally. We also identified a number of proteins that we could link 
to novel enzymatic reactions (Supplementary Note 4, Supplementary 
Figs. 1-3). For example, we identified the putative Pseudomonas gene 
encoding glucosaminate ammonia-lyase, a known biochemical activity 


with no known gene*®, 


Insights for uncharacterized families 

We identified a conserved functional association for 335 genes that 
encode representatives of 87 different domains of unknown function 
(DUFs)!” (Supplementary Table 13). We examined the phenotypes of 
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77 of these DUFs, and we propose broad functional annotations for 
13 of them and specific molecular functions for an additional eight 
(Supplementary Note 5). For example, proteins containing UPF0126 
are specifically important for glycine utilization in 11 bacteria (Fig. 4a 
shows the data from five bacteria). As UPF0126 is predicted to 
be a membrane protein, we propose that it is a glycine transporter. 
Individual mutants of three members of this family had reduced growth 
on glycine (Extended Data Fig. 6) and PGA1_c00920 partially rescues 
the glycine growth defect of an E. coli strain that lacks the glycine trans- 
porter CycA®’ (Extended Data Fig. 7). Second, we found that genes 
encoding members of the UPF0060 family of predicted transmembrane 
proteins have a conserved specific phenotype under thallium stress 
in four bacteria (Fig. 4b). Consequently, we propose that UPF0060- 
containing proteins may function as a thallium-specific efflux pump. In 
support of this hypothesis, the expression of UPF0060 proteins confers 
thallium resistance on E. coli (Extended Data Fig. 8). Third, we found 
that in three bacteria, genes encoding DUF2849-containing proteins 
were cofit with an adjacently encoded sulfite reductase, CysI (Fig. 4c). 
Sulfite reductase is important for sulfate assimilation, which is the 
only source of sulfur in our defined medium. Thus, DUF2849 is also 
presumably involved in the same process. The three bacteria containing 
DUF2849 lack CysJ, which typically provides the electron source for 
CysI. As other bacterial genomes that contain DUF2849 also contain 
cysI but not cysJ, we propose that DUF2849 is an alternate electron 
source for sulfite reductase. Finally, we found that the uncharacterized 
proteins YeaH and YcgB and the poorly characterized protein kinase 
YeaG**? had high cofitness across seven different bacteria, but with 
varying phenotypes across bacteria (Supplementary Figs. 4, 5). Given 
the protein kinase activity of YeaG, we propose that these three proteins 
act together in a conserved signalling pathway that is required for 
distinct cellular functions in different bacteria. 


Discussion 

We identified mutant phenotypes for more than ten thousand poorly 
annotated genes from 32 bacteria. By manually combining these func- 
tional associations with comparative sequence analysis, we proposed 
specific functions for transporter proteins, catabolic enzymes, and 
DUFs, and we identified putative novel DNA repair families. Most of 
these predictions require additional experimental validation. To facilitate 
further analyses of mutant phenotypes and protein sequences, we 
developed the Fitness Browser (http://fit.genomics.lbl.gov). 

A major challenge in extending our results to all bacterial proteins 
is their incredible diversity. We identified functional associations for 
potential orthologues of just 12% of all bacterial proteins that lack 
detailed annotations (Supplementary Note 6, Extended Data Fig. 9). 
Improving this coverage will require a larger effort to generate mutants 
in more diverse bacteria: our study included representatives of only six 
of the approximately forty divisions of bacteria that have been culti- 
vated so far. In summary, our study demonstrates the scale with which 
bacterial fitness data can be collected and the utility of these data to 
provide insights into the functions of many proteins. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0124-0. 


Received: 5 October 2016; Accepted: 9 April 2018; 
Published online: 16 May 2018 


1. Chang, Y.-C. et al. COMBREX-DB: an experiment centered database of protein 
function: knowledge, predictions and knowledge gaps. Nucleic Acids Res. 44, 
D330-D335 (2016). 

2. Schnoes, A. M., Brown, S. D., Dodevski, |. & Babbitt, P. C. Annotation error in 
public databases: misannotation of molecular function in enzyme 
superfamilies. PLOS Comput. Biol. 5, e1000605 (2009). 

3. Deutschbauer, A. et al. Towards an informative mutant phenotype for every 
bacterial gene. J. Bacteriol. 196, 3643-3655 (2014). 


NATUR E|www.nature.com/nature 


22. 


23. 


24. 


25. 


26. 
27. 


28. 


29. 


30. 


31. 


32. 


33. 
34. 


35. 


36. 


37. 


Deutschbauer, A. et al. Evidence-based annotation of gene function in 
Shewanella oneidensis MR-1 using genome-wide fitness profiling across 121 
conditions. PLoS Genet. 7, e€1002385 (2011). 

Nichols, R. J. et al. Phenotypic landscape of a bacterial cell. Ce// 144, 143-156 
(2011). 

Price, M. N. et al. The genetic basis of energy conservation in the sulfate- 
reducing bacterium Desulfovibrio alaskensis G20. Front. Microbiol. 5,577 (2014). 
Langridge, G. C. et al. Simultaneous assay of every Salmonella typhi gene using 
one million transposon mutants. Genome Res. 19, 2308-2316 (2009). 

van Opijnen, T., Bodi, K. L. & Camilli, A. Tn-seq: high-throughput parallel 
sequencing for fitness and genetic interaction studies in microorganisms. Nat. 
Methods 6, 767-772 (2009). 
Wetmore, K. M. et al. Rapid quantification of mutant fitness in diverse bacteria 
by sequencing randomly bar-coded transposons. MBio 6, E00306-e00315 
(2015). 


. Liu, H. et al. Magic pools: parallel assessment of transposon delivery vectors in 


bacteria. mSystems 3, e€00143-17 (2018). 


. Rubin, B. E. et al. The essential gene set of a photosynthetic organism. Proc. Nat! 


Acad. Sci. USA 112, E6634-E6643 (2015). 


. Melnyk, R. A. et al. Novel mechanism for scavenging of hypochlorite involving a 


periplasmic methionine-rich peptide and methionine sulfoxide reductase. MBio 
6, e€00233-15 (2015). 


. Smith, A. M. et al. Quantitative phenotyping via deep barcode sequencing. 


Genome Res. 19, 1836-1842 (2009). 


. Rensing, C., Pribyl, T. & Nies, D. H. New functions for the three subunits of the 


CzcCBA cation-proton antiporter. J. Bacteriol. 179, 6871-6879 (1997). 


. Hottes, A. K. et al. Bacterial adaptation through loss of function. PLoS Genet 9, 


1003617 (2013). 


. Haft, D. H. et al. TIGRFAMs and genome properties in 2013. Nucleic Acids Res. 


41, D387-D395 (2013). 


. Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, 


D222-D230 (2014). 


. Baker, J. L. et al. Widespread genetic switches and toxicity resistance proteins 


for fluoride. Science 335, 233-235 (2012). 


. Keseler, |. M. et al. EcoCyc: fusing model organism databases with systems 


biology. Nucleic Acids Res. 41, D605-D612 (2013). 


. Hillenmeyer, M. E. et al. Systematic analysis of genome-wide fitness data in 


yeast reveals novel gene function and drug action. Genome Biol. 11, R30 
(2010). 


. Rabus, R., Reizer, J., Paulsen, |. & Saier, M. H. Jr Enzyme IN* from Escherichia coli. 


A novel enzyme of the phosphoenolpyruvate-dependent phosphotransferase 
system exhibiting strict specificity for its phosphoryl acceptor, NPr. J. Biol. 

Chem. 274, 26185-26191 (1999). 

van Opijnen, T., Dedrick, S. & Bento, J. Strain dependent genetic networks for 
antibiotic-sensitivity in a bacterial pathogen with a large pan-genome. PLoS 
Pathog. 12, e1005869 (2016). 

Chen, S. H., Byrne, R. T., Wood, E. A. & Cox, M. M. Escherichia coli radD (yejH) 
gene: a novel function involved in radiation resistance and double-strand break 
repair. Mol. Microbiol. 95, 754-768 (2015). 

Lopes-Kulishey, C. O. et al. Functional characterization of two SOS-regulated 
genes involved in mitomycin C resistance in Caulobacter crescentus. DNA Repair 
(Amst) 33, 78-89 (2015). 

Gwon, G. H. et al. Crystal structure of a Fanconi anemia-associated nuclease 
homolog bound to 5’ flap DNA: basis of interstrand cross-link repair by FAN1. 
Genes Dev. 28, 2276-2290 (2014). 

Justice, S. S., Hunstad, D. A., Cegelski, L. & Hultgren, S. J. Morphological 
plasticity as a bacterial survival strategy. Nat. Rev. Microbiol. 6, 162-168 (2008). 
da Rocha, R. P,, Paquola, A. C. de M., Marques Mdo, V., Menck, C. F. M. & 
Galhardo, R. S. Characterization of the SOS regulon of Caulobacter crescentus. J. 
Bacteriol. 190, 1209-1218 (2008). 

Abella, M., Campoy, S., Erill, |., Rojo, F. & Barbé, J. Cohabitation of two different 
lexA regulons in Pseudomonas putida. J. Bacteriol. 189, 8855-8862 (2007). 

Cirz, R. T., O’Neill, B. M., Hammond, J. A., Head, S. R. & Romesberg, F. E. Defining 
the Pseudomonas aeruginosa SOS response and its role in the global response 
to the antibiotic ciprofloxacin. J. Bacterio/. 188, 7101-7110 (2006). 

Wiegmann, K. et al. Carbohydrate catabolism in Phaeobacter inhibens DSM 
17395, a member of the marine roseobacter clade. Appi. Environ. Microbiol. 80, 
4725-4737 (2014). 
Brouns, S. J. J. et al. Identification of the missing links in prokaryotic pentose 
oxidation pathways: evidence for enzyme recruitment. J. Biol. Chem. 281, 
27378-27388 (2006). 
Johnsen, U. et al. D-xylose degradation pathway in the halophilic archaeon 
Haloferax volcanii. J. Biol. Chem. 284, 27290-27303 (2009). 

Stephens, C. et al. Genetic analysis of a novel pathway for D-xylose metabolism 
in Caulobacter crescentus. J. Bacteriol. 189, 2181-2185 (2007). 

Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a 
reference resource for gene and protein annotation. Nucleic Acids Res. 44, 
D457-D462 (2016). 

Overbeek, R. et al. The SEED and the Rapid Annotation of microbial genomes 
using Subsystems Technology (RAST). Nucleic Acids Res. 42, D206-D214 
(2014). 

Iwamoto, R. & Imanaga, Y. Direct evidence of the Entner—-Doudoroff pathway 
operating in the metabolism of d-glucosamine in bacteria. J. Biochem. 109, 
66-69 (1991). 

Ghrist, A. C. & Stauffer, G. V. The Escherichia coli glycine transport system and its 
role in the regulation of the glycine cleavage enzyme system. Microbiology 141, 
133-140 (1995). 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


38. Figueira, R. et al. Adaptation to sustained nitrogen starvation by Escherichia coli 


requires the eukaryote-like serine/threonine kinase YeaG. Sci. Rep. 5, 17524 (2015). 


39. Tagourti, J., Landoulsi, A. & Richarme, G. Cloning, expression, purification and 
characterization of the stress kinase YeaG from Escherichia coli. Protein Expr. 
Purif. 59, 79-85 (2008). 


Acknowledgements We thank V. Lo, W. Shao, and K. Keller for technical 
assistance with the Fitness Browser website. Sequencing was performed at: the 
Vincent J. Coates Genomics Sequencing Laboratory (University of California 

at Berkeley), supported by NIH S10 Instrumentation Grants S10RRO29668, 
S$10RRO27303, and OD018174; the DOE Joint Genome Institute; the College 
of Biological Sciences U-DNA Sequencing Facility (UC Davis); and the Institute 
for Genomics Sciences (University of Maryland). Studies of novel isolates were 
conducted by ENIGMA and were supported by the Office of Science, Office of 
Biological and Environmental Research of the US Department of Energy, under 
contract DE-ACO2-05CH11231. The other data collection was supported by 
Laboratory Directed Research and Development (LDRD) funding from Berkeley 
Laboratory, provided by the Director, Office of Science, of the US Department 
of Energy under contract DE-ACO2-05CH11231 and a Community Science 
Project from the Joint Genome Institute to M.J.B., J.B., A.PA., and A.M.D. The 
work conducted by the US Department of Energy Joint Genome Institute, a DOE 
Office of Science User Facility, is supported by the Office of Science of the US 
Department of Energy under contract no. DE-ACO2-05CH11231. 


ARTICLE 


Author contributions A.M.D., A.P.A., M.N.P., M.J.B., and J.B. conceived the 
project. A.M.D., A.P.A., M.J.B., and J.B. supervised the project. A.M.D. led 

the experimental work. A.M.D., K.M.W., R.J.W., R.A.M., M.C., J.R., J.V.K., HLL, 
H.K.C., J.S.L, Y.S., Z.E., and H.S. collected data. R.C. isolated bacteria. M.N.P. 
and A.M.D. analysed the fitness data. R.A.M., R.J.W., and M.N.P. assembled 
genomes. B.E.R. provided resources and advice on S. elongatus experiments. 
G.M.Z. and J.D.W. generated gene deletion mutants in Pseudomonas stutzeri 
RCH2. A.V. edited the manuscript and provided advice. M.N.P., M.J.B., and 
A.M.D. wrote the paper. 


Competing interests The authors declare no competing interests. 


Additional information 

Extended data is available for this paper at https://doi.org/10.1038/s41586- 
018-0124-0. 

Supplementary information is available for this paper at https://doi. 
org/10.1038/s41586-018-0124-0. 

Reprints and permissions information is available at http://www.nature.com/ 
reprints. 

Correspondence and requests for materials should be addressed to M.J.B., 
A.PA., & A.M.D. 

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional 
claims in published maps and institutional affiliations. 


NATUR E|www.nature.com/nature 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


ARTICLE 


METHODS 


Bacteria for high-throughput genetics. We attempted to make transposon mutant 
libraries in more than 100 bacteria as part of this study, including representatives 
of Proteobacteria, Bacteroidetes, Firmicutes, Actinobacteria, and Planctomycetes; 
we present data from the 32 bacteria to which we successfully applied RB-TnSeq 
(Supplementary Table 14). Eight bacteria were isolated from groundwater 
collected from different monitoring wells at the Oak Ridge National Laboratory 
Field Research Center (FRC; https://public.ornl.gov/orifc/orfrc1_fieldchallenge. 
cfm), and five have not been described previously: Acidovorax sp. GW101-3H11, 
Pseudomonas fluorescens FW300-N1B4, P. fluorescens FW300-N2E3, P. fluorescens 
FW300-N2C3, and P. fluorescens GW456-L13. Acidovorax sp. GW101-3H11 was 
isolated as a single colony on a Luria-Bertani (LB) agar plate grown at 30 °C using 
an inoculum from FRC well GW101. P fluorescens FW300-N1B4, P. fluorescens 
FW300-N2E3, and P. fluorescens FW300-N2C3 were all isolated at 30 °C under 
anaerobic denitrifying conditions with acetate, propionate, and butyrate as the 
carbon source, respectively, using an inoculum from FRC well FW300. P. fluorescens 
GW456-L13 was isolated from FRC well FW456 under anaerobic incubations 
on an LB agar plate. We previously described the isolation of P. fluorescens FW300- 
N2E2", C. basilensis 4G11*!, and Pedobacter sp. GW460-11-11-14-LB5!°. The 
eight FRC isolates have been submitted to the Leibniz Institute DSMZ (German 
Collection of Microorganisms and Cell Cultures GmbH). 

Our study included seven strains of Pseudomonas and four strains of Shewanella 
because of their genetic tractability. The different strains in these genera have large 
differences in gene content. For example, the two most closely related strains we 
studied were P. fluorescens FW300-N2E2 and FW-N2C3, and the typical pair of 
orthologous proteins from these strains has 97% amino acid identity. Nevertheless, 
each genome contains over 1,000 predicted protein-coding genes that do not have 
orthologues in the other strain. 

Strain construction for single gene studies. The oligonucleotides and gBlocks 
used in this study are listed in Supplementary Table 15, the plasmids and details 
on their construction in Supplementary Table 16, and the single strains for follow- 
up studies in Supplementary Table 17. We constructed kanamycin-marked gene 
deletions in Pseudomonas stutzeri RCH2 using a previously described double 
homologous recombination strategy”. We used a similar strategy to construct 
kanamycin-marked deletions in Dinoroseobacter shibae and P. inhibens, except 
that we delivered the deletion constructs by conjugation from E. coli. We con- 
structed markerless gene deletions of S04008 and SO1319 from S. oneidensis MR-1 
and Pf6N2E2_4800 and Pf6N2E2_4801 from P fluorescens FW300-N2E2 using 
sacB counter-selection. For genetic complementation experiments, we cloned the 
genes into the broad range vector pBBR1-MCS5*. For the thallium resistance 
experiments, we cloned members of UPF0060 into plasmid pFAB2286. For E. coli 
BW25113, we used single-gene deletions from the Keio collection. For S. oneidensis 
MR-1, we also used transposon mutants that had been individually sequenced*. 
Unless indicated otherwise, we tested the growth phenotypes of these individual 
strains in 96-well microplates at 30 °C using the standard rich or minimal defined 
growth medium for each bacterium. These growth assays were performed in a 
Tecan microplate reader (either Sunrise or Infinite F200) with absorbance readings 
(ODe00) every 15 min. 

Media and standard culturing conditions. A full list of the media used in this 
study and their components is given in Supplementary Table 18. We routinely cul- 
tured Acidovorax sp. GW101-3H11, A. brasilense Sp245, Burkholderia phytofirmans 
PsJN, Dyella japonica UNC79MFTsu3.2, E. coli BW25113, Herbaspirillum serope- 
dicae SmR1, Klebsiella michiganensis M5a1, all Peeudomonads and Shewanellae, 
S. meliloti 1021, and Sphingomonas koreensis DSMZ 15582 in LB. Caulobacter 
crescentus NA1000 was typically cultured in PYE medium. C. basilensis 4G11 and 
Pedobacter sp. GW460-11-11-14-LB5 were typically cultured in R2A medium. 
Dechlorosoma suillum PS was cultured in ALP medium”. D. vulgaris Miyazaki 
F was grown anaerobically in lactate-sulfate (MOLS4) medium, as previously 
described*>*°, We used marine broth (Difco 2216) for standard culturing of 
D. shibae DFL-12, Echinicola vietnamensis, Kangiella aquimarina SW-154T, 
Marinobacter adhaerens HP15, P. inhibens BS107, and Pontibacter actiniarum. 
S. elongatus PCC 7942 was normally cultured in BG-11 medium with either 7,000 
or 9,250 1x. All bacteria were typically cultured at 30 °C except E. coli BW25113 
and Shewanella amazonensis SB2B, which were cultured at 37 °C, and P. inhibens 
BS107, which was grown at 25 °C. The E. coli conjugation strain WM3064 was 
cultured in LB medium at 37 °C with diaminopimelic acid (DAP) added to a final 
concentration of 300\.M. 

High-throughput growth assays of wild-type bacteria. To assess the phenotypic 
capabilities of 30 aerobic heterotrophic bacteria and to identify conditions suitable 
for mutant fitness profiling, we monitored the growth of the wild-type bacteria in 
a 96-well microplate assay. These prescreen growth assays were performed in a 
Tecan microplate reader (either Sunrise or Infinite F200) with absorbance readings 
(OD¢o0) every 15 min. All 96-well microplate growth assays contained 150 1l 
culture volume per well at a starting OD¢oo of 0.02. We used the grofit package 


in R*’ to analyse all growth curve data in this study. For carbon and nitrogen 
source utilization, we tested 94 and 45 possible substrates, respectively, in a defined 
medium (Supplementary Tables 2, 3). We classified a bacterium as positive for 
usage of a particular substrate if (1) the maximum ODgoo on the substrate was at 
least 1.5 x greater than the average of the water controls and the integral under the 
curve (spline.integral) was 10% greater than the average of the water controls or (2) 
a successful genome-wide fitness assay was collected on the substrate, as described 
below. We included the second criterion because our automated scoring of the 
wild-type growth curves was conservative and did not include all conditions used 
for genome-wide fitness assays. 

Additionally, for each of the 30 heterotrophic bacteria, we determined the inhibitory 

concentrations for 34-55 diverse stress compounds including antibiotics, biocides, 
metals, furans, aldehydes, and oxyanions. For each compound, we grew the wild- 
type bacterium across a 1,000-fold range of inhibitor concentrations in a rich 
medium. We used the spline.integral parameter of grofit to fit dose-response curves 
and calculate the half-maximum inhibitory concentration (ICs) values for each 
compound (Supplementary Table 4). For D. vulgaris Miyazaki F and S. elongatus 
PCC 7942, we did not perform these growth prescreen assays; rather, we just per- 
formed the mutant fitness assays across a broad range of inhibitor concentrations. 
Genome sequencing. We sequenced Acidovorax sp. GW101-3H11 and five 
Pseudomonads by using a combination of Illumina and Pacific Biosciences. For 
Illumina-first assembly, we used scythe (https://github.com/vsbuffalo/scythe) 
and sickle (https://github.com/najoshi/sickle) to trim and clean Illumina reads, 
we assembled with SPAdes 3.0°8, we performed hybrid Illumina/PacBio assembly 
on SMRTportal using AHA”, we used BridgeMapper on SMRTportal to fix mis- 
assembled contigs, we mapped Illumina reads to the new assembly with bowtie 
2°, and we used pilon to correct local errors’!. For Acidovorax sp. GW101-3H11, 
we instead used A5* to assemble the Illumina reads and we used AHA to join 
contigs together. For PacBio-first assembly, we used HGAP3 on SMRTportal, we 
used circlator to find additional joins*’, and we again used bowtie 2 and pilon to 
correct local errors. See Supplementary Table 19 for a summary of these genome 
assemblies and their accession numbers. In addition, S. koreensis DSMZ 15582 was 
sequenced for this project by the Joint Genome Institute, using Pacific Biosciences. 
Constructing pools of randomly barcoded transposon mutants. The transposon 
mutant libraries for ten bacteria have been described previously?~!”. The other 22 
bacteria were mutagenized with randomly barcoded plasmids containing a mariner 
or Tn5 transposon, a pir-dependent conditional origin of replication, and a kana- 
mycin resistance marker, using previously described vectors’. The plasmids were 
delivered by conjugation with E. coli WM3064, which is a diaminopimelate aux- 
otroph and is pir*. The conditions for mutagenizing each organism are described in 
Supplementary Table 20. Generally, we conjugated mid-log-phase grown WM3064 
donor (either mariner donor plasmid library APA752 or Tn5 donor plasmid library 
APA766) and recipient cells on 0.45 ,\M nitrocellulose filters (Millipore) overlaid 
on rich medium agar plates supplemented with DAP. We used the rich medium 
preferred by the recipient (Supplementary Table 20). After conjugation, filters were 
resuspended in recipient rich medium and plated on recipient rich medium agar 
plates supplemented with kanamycin. After growth, we scraped together kana- 
mycin-resistant colonies into recipient rich medium with kanamycin, diluted the 
culture back to a starting OD¢00 of 0.2 in 50-100 ml of recipient rich medium with 
kanamycin, and grew the mutant library to a final OD¢00 of between 1.0 and 2.0. 
We added glycerol to a final volume of 10%, made multiple 1- or 2-ml —80 °C 
freezer stocks, and collected cell pellets to extract genomic DNA for TnSeq. For 
D. vulgaris Miyazaki F, we selected for G418-resistant transposon mutants in liquid 
medium with no plating step (Supplementary Table 20). 
Transposon insertion site sequencing (TnSeq). Given a pool of mutants, we 
performed TnSeq to amplify and sequence the transposon junction and to link 
the barcodes to a location in the genome’. In this study, we used the TnSeq data 
for two independent analyses. First, we used the mapped transposon insertions 
to identify genes that are likely to be essential for viability (or nearly so) in the 
conditions that we used to select the mutant library (see below). We performed 
this gene essentiality analysis independently of the DNA barcodes and we therefore 
considered transposon insertion locations that were not included in the mutant 
pool definition for BarSeq fitness assays. Second, we used the TnSeq data to define 
the mutant library for high-throughput mutant fitness assays. Here, we analysed 
both the transposon insertion data and its associated random DNA barcode, as 
the RB-TnSeq approach requires accurate association of the genomic insertion 
location of the transposon to a unique, random DNA barcode. We considered a 
barcode to be confidently mapped to a location if this mapping was supported by at 
least ten reads and the barcode mapped primarily to one location’. (For Shewanella 
sp. ANA-3, the threshold was eight reads.) The number of unique barcodes 
(strains) mapped in each mutant library is shown in Supplementary Table 20. Given 
this mapping, the abundance of the strains in each sample can be determined by 
a simpler and cheaper protocol: amplifying the barcodes with PCR followed by 
barcode sequencing". 
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Identifying essential or nearly essential genes. Genes that lack insertions or that 
have very low coverage in the start samples are likely to be essential or important 
for growth (nearly essential) in rich medium, as except for S. elongatus, pools of 
mutants were produced and recovered in medium that contained yeast extract. We 
used previously published heuristics"! to distinguish likely-essential genes from 
genes that are too short or that are too repetitive to map insertions in. Briefly, for 
each protein-coding gene, we computed the total read density in TnSeq (reads/ 
nucleotides across the entire gene) and the density of insertion sites within the 
central 10-90% of each gene (sites/nucleotides). We did not consider the DNA 
barcodes in this analysis of essential genes. Across the 32 bacteria, we mapped 5-66 
insertions for the typical (median) protein-coding gene. We then excluded genes 
that might be difficult to map insertions within because they were very similar to 
other parts of the genome (BLAT score above 50) and also very short genes of less 
than 100 nucleotides. Given the median insertion density and the median length of 
the remaining genes, we investigated how short a gene could be and still be unlikely 
to have no insertions at all by chance (P< 0.02, Poisson distribution). Genes shorter 
than this threshold were excluded; the threshold varied from 100 nucleotides for 
P inhibens, C. crescentus, and S. elongatus to 675 nucleotides for S. lothica PV-4. For 
the remaining genes, we normalized the read density by GC content by dividing 
by the running median of read density over a window of 201 genes (sorted by GC 
content). We normalized the insertion density so that the median gene’s value was 
1. Protein-coding genes were considered essential or important for growth (nearly 
essential) if we did not estimate fitness values for the gene and both the normalized 
insertion density and the normalized read density were under 0.2. A validation of 
this approach is described in Supplementary Note 1. 

Mutant fitness assays. For each mutant library, we performed competitive mutant 
fitness assays under a large number of growth conditions that were chosen on the 
basis of the results of high-throughput growth assays of wild-type bacteria (see 
above). For some bacteria, we also assayed growth at varying pH or temperature, 
motility on agar plates, or survival. The conditions that we profiled varied across 
the bacteria owing to the differing growth capabilities of the bacteria as well as 
experimental limitations. For example, the heterotrophic bacteria we investigated 
showed a wide range in the number of compounds (either carbon or nitrogen) 
that they could utilize for growth in a defined medium (Supplementary Table 2, 
Supplementary Table 3). For example, only 14 of the 32 bacteria were capable of 
using D-xylose as the sole carbon source, and we successfully assayed mutant fitness 
in 12 of these strains. We did not perform carbon and nitrogen source experi- 
ments in K. aquimarina SW-154T and P. actiniarum, as we could not culture these 
bacteria in a defined growth medium. In addition, some stress experiments were 
not performed because of the native resistance of the bacteria to the compound 
(Supplementary Table 4). 

In addition to biological reasons for the differences among the exact set of con- 
ditions we profiled, for some bacteria we did not attempt certain assays. We did 
not systematically test D. vulgaris Miyazaki F for carbon source utilization and 
we did not perform nitrogen source experiments in two heterotrophic bacteria 
that grow in defined medium: D. suillum PS and S. loihica PV-4. For 31 of the 32 
bacteria, we attempted fitness assays for a core set of 32 stress compounds (see 
Supplementary Table 4), but for D. suillum PS, we attempted only 16 of them. We 
successfully studied motility in 12 of the 32 bacteria using a soft agar assay. For the 
other bacteria, we either did not attempt a motility assay or the cells were motile 
but the mutant fitness data did not pass our thresholds for a successful experiment. 
Similarly, there were many growth-based fitness experiments (carbon and nitrogen 
source, and stressors) that we performed but did not pass our metrics for a 
successful experiment’. Although a few dozen of these samples had low read depth, 
which might indicate a problem during PCR of the barcodes or sequencing, we 
believe that most of these experiments failed owing to biological factors that make 
them incompatible with a pooled fitness assay, such as intense positive selection or 
potentially stochastic exit from lag phase’. For example, across all 32 bacteria, there 
are 197 bacterium-stress combinations that lack fitness data because none of the 
experiments for that stress succeeded by our metrics. For 85 of these combinations, 
we attempted more than one experiment. 

The full list of experiments performed for each mutant library along with 
detailed metadata are available in Supplementary Table 5, on Figshare (https://doi. 
org/10.6084/m9.figshare.5134837) and at http://genomics.lbl.gov/supplemental/ 
bigfit/. Our analysis includes 385 successful experiments from Wetmore et al.” 
and 36 successful experiments from Melnyk et al.!?. The other 4,449 successful 
fitness assays are described here for the first time. In general, all growth assays 
with carbon sources, nitrogen sources, and inhibitors were done as previously 
described’. In brief, an aliquot of the mutant library was thawed and inoculated 
into 25 ml rich medium with kanamycin or erythromycin and grown to mid-log 
phase in a flask. The only exception was K. michiganensis, which we recovered 
to stationary phase; experiments with the usual exponential phase start samples 
for this library had suspicious correlations of gene fitness with GC content and 
did not meet our quality metrics. Depending on the mutant library, this growth 
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recovery took between 3 and 24h. After recovery, we collected pellets for genomic 
DNA extraction and barcode sequencing (BarSeq) of the start sample. We used 
the remaining cells to set up multiple mutant fitness assays with diverse carbon 
and nitrogen sources in defined media and diverse inhibitors in rich media, all at 
a starting ODg0o of 0.02. In addition, for most bacteria, we profiled growth of the 
mutant library at different pH values and temperatures. After the mutant library 
grew to saturation under the selective growth condition (typically 4-8 population 
doublings), we collected a cell pellet for genomic DNA extraction and BarSeq of 
the ‘end’ sample. As described below, we calculate gene fitness from the barcode 
counts of the end sample relative to the start sample. 

We used a number of different growth formats and media formations for 
mutant fitness assays across the 32 bacteria. A full list of compound components 
for each growth medium are contained in Supplementary Table 18. Many fitness 
assays were done in 48-well microplates (Greiner) with 700 jul culture volume per 
well and grown in a Tecan Infinite F200 plate reader with OD¢o9 measurements 
every 15 min. For these 48-well microplate assays, we combined the cultures from 
two replicate wells before genomic DNA extraction (total volume of experiment, 
1.4ml). For 24-well microplate experiments, we typically used deep-well plates 
with 1.5 ml (for inhibitors) or 2 ml (for carbon and nitrogen sources) total culture 
volume per well. For six bacteria (C. crescentus, D. japonica UNC79MFTsu3.2, 
E. vietnamensis, K. michiganensis, Pedobacter sp. GW460-11-11-14-LB5 and 
P. actiniarum), we performed all fitness assays in transparent 24-well microplates 
(Greiner) with 1.2 ml total volume per well. All 24-well microplate experiments 
were grown in a Multitron incubating shaker (Innova). For 24-well microplate 
experiments, we typically took the OD¢oo of each culture every 12-24h in a Tecan 
plate reader (after transferring the cells to a Greiner 96-well microplate for the 
24-well deep well plate experiments). Because we used different methods for 
measuring ODgop values (cuvettes with a spectrophotometer; 24, 48 and 96 well 
microplates in a plate reader), we used standard curves to interconvert microplate 
ODego0 values to a common reference (cuvette with spectrophotometer). Over 1,000 
experiments, primarily carbon source and temperature experiments, were done 
in glass test tubes with 5 ml culture volumes. For the test tube experiments, we 
monitored ODgo0 every 12-24h with a standard spectrophotometer and cuvettes. 

For stress experiments, we aimed to use an inhibitory but sub-lethal concentra- 
tion of the compound, typically at a concentration that resulted in a 50% reduction 
in growth rate (ICs9). Both the growth and the inhibition are crucial for identifying 
mutants that have altered sensitivity to the compound. If the concentration of 
inhibitor is too high and there is no growth, then the abundance of the strains will 
not change and all of the fitness values will be near zero. If the concentration of 
inhibitor is too low and there is no growth inhibition, then the fitness pattern is 
likely to be as if the compound were not added at all. We identified these inhibi- 
tory concentrations using the growth curves with the wild-type bacteria described 
above. In addition to the calculated ICs for a compound, we often used a few 
different concentrations above and below the ICs» to try to capture an inhibitory 
but sub-lethal concentration. We used multiple concentrations of a single com- 
pound because we found that the ICso values determined with wild-type bacteria 
in 96-well microplates (see above) sometimes did not agree with the mutant fitness 
experiments, which were done with a complex transposon mutant library in 24 or 
48-well microplates. For assays done in 48-well microplates and grown in a Tecan 
Infinite F200 plate reader, we could confirm that the culture was inhibited relative 
to a no-stress control. For stress assays in 24-well microplates and grown in the 
Multitron shaker, we took OD readings approximately every 12h to estimate which 
cultures were inhibited. In practice, for a given mutant library, we often collected 
mutant fitness data with different concentrations of the same inhibitor. We also 
collected fitness data in plain rich medium without an added inhibitory compound. 

For some carbon source experiments in D. vulgaris Miyazaki F, which is strictly 
anaerobic, we grew the mutant pool in 18 x 150-mm hungate tubes with a butyl 
rubber stopper and an aluminium crimp seal (Chemglass Life Sciences) with a 
culture volume of 10 ml and a headspace of about 15 ml. For the remainder of the 
D. vulgaris fitness experiments, we grew the mutant pool in 24-well microplates 
inside an anaerobic chamber. We used OD¢09 measurements to determine which 
cultures were inhibited by varying concentrations of stress compounds. Similarly, 
for six of the other heterotrophs, we measured gene fitness during anaerobic 
growth. All anaerobic media were prepared within a Coy anaerobic chamber with 
an atmosphere of about 2% H2, 5% COo, and 93% Nb. 

For S. elongatus PCC 7942, which is strictly photosynthetic, we recovered the 
library from the freezer in BG-11 medium at a light level of 7,0001x and we con- 
ducted fitness assays at 9,2501x. We used OD759 to measure the growth of S. elongatus. 
Most S. elongatus mutant fitness assays were done in the wells of a 12-well 
microplate (Falcon) with a 5 ml culture volume. 

For motility assays, the mutant pool was inoculated into the centre of a 0.3% 
agar rich medium plate and ‘outer’ samples with motile cells were removed with a 
razor after 24-48 h. In many instances, we also removed an ‘inner’ sample of cells 
from near the point of inoculation. Not all bacteria we assayed were motile in this 
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soft agar assay and others were motile but did not give mutant fitness results that 
passed our quality metrics. 

In four bacteria, we also assayed survival. In these assays, a mutant pool was 
subjected to a stressful condition (either extended stationary phase, starvation, or a 
low temperature of 4 °C) for a defined period; then, to determine which strains were 
still viable, they were recovered in rich medium for a few generations. After recovery 
in rich medium, the cells were harvested for genomic DNA extraction and BarSeq. 

We performed replicate experiments (not necessary at the same concentration) 
for 25 of the 29 bacteria with carbon source data; the exceptions are B. phytofirmans 
PsJN, P. fluorescens FW300-N1B4, P. fluorescens FW300-N2E3, and S. meliloti 1021. 
Similarly, we performed replicate experiments (not necessarily at the same concen- 
tration) for 17 of the 28 bacteria with nitrogen source data. The bacteria with nitrogen 
source data but without biological replicates are: Acidovorax sp. GW101-3H11, B. 
phytofirmans PsJN, C. basilensis 4G11, D. japonica UNC79MFTsu3.2, H. seropedicae 
SmRI1, P. fluorescens FW300-N1B4, P. fluorescens FW300-N2E3, P. fluorescens 
FW300-N2C3, P fluorescens GW456-L13, S. meliloti 1021, and P. simiae WCS417. 
Lastly, we performed replicates (not necessarily at the same concentration) for 
the majority of stress experiments for 16 of the 32 bacteria: C. crescentus N1000, 
E. vietnamensis, D. shibae DFL-12, K. aquimarina SW-154T, S. koreensis DSMZ 
15582, K. michiganensis M5al, M. adhaerens HP15, D. vulgaris Miyazaki F, 
S. oneidensis MR-1, P. inhibens BS107, P. actiniarum, D. suillum PS, P. stutzeri RCH2, 
Pedobacter sp. GW460-11-11-14-LB5, S. loihica PV4, and S. elongatus PCC 7942. 
Barcode sequencing (BarSeq). Genomic DNA extraction and barcode PCR were 
performed as described previously’. Most genomic DNA extractions were done in 
a 96-well format using a QIAcube HT liquid handling robot (QIAGEN). We used 
the 98 °C BarSeq PCR protocol’, which is less sensitive to high GC content than 
the original 95 °C protocol. In general, we multiplexed 48 samples or 96 samples 
per lane of Illumina HiSeq. For E. coli, we sequenced 96 samples per lane. For 
sequencing runs on the Illumina HiSeq4000, we sequenced 96 samples per lane. 
For the HiSeq4000 runs, we used an equimolar mixture of four common P1 oligos 
for BarSeq, with variable lengths of random bases at the start of the sequencing 
reactions (2-5 nucleotides) (Supplementary Table 15). We did this to phase the 
amplicons for sequencing and to aid in cluster discrimination on the HiSeq4000. 
Computation of fitness values. Fitness data was analysed as previously described’. 
In brief, the fitness value of each strain (an individual transposon mutant) is the 
normalized log,(strain barcode abundance at end of experiment/strain barcode 
abundance at start of experiment). The fitness value of each gene is the weighted 
average of the fitness of its strains. In this study, we restricted our analysis to the 
123,255 different non-essential protein-coding genes for which we collected gene 
fitness data. Gene fitness values describe the relative abundance of the mutant 
strains in a condition, regardless of their fitness in rich medium. Only strains 
containing insertions within the central 10-90% of the gene and with sufficient 
abundance, on average, in the start samples were included in these calculations; 
we used 3-26 mutant strains for the typical protein-coding gene in each bacterium 
(Supplementary Table 20). Specifically, in the average start samples, we required 
that each gene have at least 30 reads, and each individual strain for that gene to 
have at least 3 reads. Because we include the same set of strains in the analysis of 
each experiment, there are no instances where a gene has a fitness value in one 
experiment but not in another. The gene fitness values were then normalized to 
remove the effects of variation in gene copy number, for example owing to variation 
in plasmid copy number relative to the chromosome or to higher effective copy 
number near the origin of replication in actively dividing cells. The median for 
each scaffold was set to zero, and for large scaffolds (over 250 genes), the running 
median of the gene fitness values was subtracted. Also, for large scaffolds (over 250 
genes), the peak of the distribution of gene fitness values was set to zero. 

Fitness experiments were deemed successful using the quality metrics that we 
described previously’. These metrics ensure that the typical gene has sufficient 
coverage, that the fitness values of independent insertions in the same gene are 
consistent, and that there is no GC bias’. Experiments that did not meet these 
thresholds were excluded from our analyses. The remaining experiments showed 
good agreement between exact replicates, with a median correlation of 0.87 for 
gene fitness values from defined media experiments. Ninety-five per cent of the 
replicate defined media experiments had r > 0.64. Stress experiments sometimes 
have little biological signal, as they are usually done in rich medium and mutants 
of genes that are important for growth in rich medium may be absent from the 
pools. Nevertheless, the median correlation between replicate stress experiments 
was 0.74, and 95% of replicate stress experiments had r > 0.35. 

To estimate the reliability of the fitness value for a gene in a specific experiment, 
we use a t-like test statistic, which is the gene’s fitness divided by the standard 
error’. The standard error is the maximum of two estimates. The first estimate 
is based on the consistency of the fitness for the strains in that gene. The second 
estimate is based on the number of reads for the gene. 

Even mild phenotypes were quite consistent between replicate experiments if 
they were statistically significant. For example, if a gene had a mild but signifi- 


cant phenotype in one replicate (0.5 < |fitness| <2 and |t| > 4), then the sign of 
the fitness value was the same in the other replicate 96.7% of the time. Because 
this comparison might be biased if the two replicates were compared to the same 
control sample, only replicates with independent controls were included. 

Genes with statistically significant phenotypes. We averaged fitness values from 
exact replicate experiments. We combined t scores across replicate experiments 
with two different approaches. If the replicates did not share a start sample and were 
entirely independent, then we used tomb = sum(t)/sqrt(n), where n is the number 
of replicates. But if the replicates used the same start sample then this metric would 
be biased. To correct for this, we assumed that the start and end samples have sim- 
ilar amounts of noise. This is conservative because we usually sequenced the start 
samples with more than one PCR and with different multiplexing tags. Given this 
assumption and given that variance(A + B) = variance(A) + variance(B) if A and 
B are independent random variables, it is easy to show that the above estimate of 
tcomb needs to be divided by sqrt((n? + n)/(2n)), where n is the number of replicates. 

Our standard threshold for a statistically significant phenotype was |fitness| > 0.5 
and |combined ¢| > 4, but this was increased for some bacteria to maintain a false dis- 
covery rate (FDR) of less than 5%. We use a minimum threshold on |fitness| as well 
as |t| to account for imperfect normalization or for other small biases in the fitness 
values. To estimate the number of false positives, we used control experiments, that 
is, comparisons between different measurements of different aliquots of the same 
start sample. However we did not use some previously published control compar- 
isons” that used the old PCR settings and had strong GC bias (to exclude these, 
we used the same thresholds that we used to remove biased experiments). The 
estimated number of false positive genes was then the number of control measure- 
ments that exceeded the thresholds, multiplied by the number of conditions and 
divided by the number of control experiments. As a second approach to estimate 
the number of false positives, we used the number of expected false positives if 
the t-omb Values follow the standard normal distribution (2 x P(z> t) x #experi- 
ments x #genes). If either estimate of the false discovery rate was above 5%, we 
raised our thresholds for both [fitness| and |t.omp| in steps of 0.1 and 0.5, respectively, 
until FDR < 5%. The highest thresholds used were |fitness| > 0.9 and |t| > 6. Also, 
for Pseudomonas fluorescens FW300-N1B4, we identified six genes with large dif- 
ferences between control samples (|fitness| ~ 2 and |t| ~ 6). These genes cluster on 
the chromosome in two groups and have high cofitness, and several of the genes 
are annotated as being involved in capsular polysaccharide synthesis. Because this 
bacterium is rather sticky, we suspect that mutants in these genes are less adherent 
and were enriched in some control samples owing to insufficient vortexing, so these 
six genes were excluded when estimating the number of false positives. 

Sequence analysis. To assign genes to Pfams’” or TIGREAMs'®, we used HMMer 
3.1b1*4 and the trusted score cutoff for each family. We used Pfam 28.0 and 
TIGREAM 15.0. We used only the curated families in Pfam (‘Pfam A’). 

To identify putative orthologues between pairs of genomes, we used bidirectional 
best protein BLAST hits with at least 80% alignment coverage both ways. We did 
not use any cutoff on similarity, as a similarity of phenotype can show that distant 
homologues have conserved functions. For the analysis of conserved specific pheno- 
types with cisplatin stress and xylose carbon catabolism, we used greedy clustering 
to go from pairs of orthologues to orthologue groups. In some instances, a single 
protein family was split into multiple orthologue groups and combined manually. 

To estimate the evolutionary relationships of the bacteria that we studied 

(Fig. 1b), we used Amphora2” to identify 31 highly-conserved proteins in each 
genome and to align them, we concatenated the 31 protein alignments, and we 
used FastTree 2.1.8°° to infer a tree. 
Specific phenotypes. We defined a specific phenotype for a gene in an experiment 
as: |fitness| > 1 and |¢| > 5 in this experiment; |fitness| < 1 in at least 95% of exper- 
iments; and the fitness value in this experiment is noticeably more extreme than 
most of its other fitness values (|fitness| > 95th percentile(|fitness|) + 0.5). Our 
definition of specific phenotypes is sensitive to the conditions that we profiled. 
For example, an amino acid biosynthetic gene with an auxotrophic phenotype in 
defined medium will probably not be identified with a specific phenotype in any of 
our experiments, as such a gene is expected to have a severe fitness defect in nearly 
all defined media fitness assays. To minimize this issue for the stress experiments, 
we chose stress compounds based on the variable genome-wide mutant fitness 
patterns they elicited from a prior study’. 

To demonstrate the reliability of specific phenotypes on a broader scale, we 
investigated how often genes with specific phenotypes for the utilization of carbon 
sources could be assigned to their annotated SEED subsystems*° (Supplementary 
Table 21). Across 32 carbon sources, we identified specific-important phenotypes 
(a specific phenotype with fitness < 0) for 643 genes that are linked to any SEED sub- 
system, and 388 of these (60%) were linked to the expected subsystem, which is far 
higher than the 1% that would be expected by chance (P< 107}, Fisher’s exact test). 

We considered a specific phenotype to be conserved if a potential orthologue 
had a specific phenotype with the same sign in a similar experiment with the same 
carbon source, nitrogen source, or stressful compound (but not necessarily using 
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the same base medium or the same concentration of the compound). For specific- 
important phenotypes (fitness < —1), we also considered a specific phenotype to be 
conserved if a potential orthologue had fitness < —1 and t< —4 ina similar exper- 
iment, regardless of how many experiments the orthologue had a phenotype in. 
Across our entire data set, many of the specific phenotypes were conserved, 
which supports their functional relevance. For specific-important phenotypes, if 
a putative orthologue had a fitness measurement in the corresponding condition, 
then 34% of the time, the orthologue had fitness < —1. Even for orthologues in 
the same genus, the phenotype was conserved just 46% of the time, so we believe 
that the lack of conservation is usually due to indirect phenotypes rather than due 
to the orthologues having different functions. For strong specific-important phe- 
notypes (fitness < —2) in defined medium conditions, the conservation rate rose 
to 54% (or 72% within a genus); it appears that strong and specific phenotypes in 
defined medium are particularly likely to indicate a direct association. In contrast, 
the conservation rate of specific and detrimental phenotypes (fitness > 1) was just 
6%, which suggests that most of these associations are indirect. So, we consider a 
detrimental association to be conserved only if an orthologue meets the criteria 
for a specific (and detrimental) phenotype in that condition. 
Cofitness and conserved cofitness. We calculate cofitness (r) as the Pearson 
correlation of all of fitness values for a pair of genes within a single bacterium*". 
By contrast, conserved cofitness is identified by comparing cofitness scores for 
pairs of orthologous genes from two bacteria. Conserved cofitness is defined as 
minimum(cofitness in the source genome, max(cofitness of orthologues in other 
genomes)). In other words, a pair of genes is conserved cofit at 0.6 if cofitness > 0.6 
in the source genome and any orthologous pair had cofitness > 0.6. Note that the 
calculations of cofitness and conserved cofitness do not take into account the exper- 
imental conditions, so it is common for cofit gene pairs to have different phenotypes 
in the two bacteria. We also note that the number of cofitness associations in our 
data are far higher than if the data were just noise. For example, we resampled the 
gene fitness values from the control comparisons (between ‘start’ samples) to make 
a new data set with the same number of experiments in each bacterium but with 
no biological signal. With these data, there were no cases of conserved cofitness 
above 0.6 and there were just four genes with cofitness in a bacterium above 0.8. 
Our fitness calculations for stress experiments do not correct for the fitness 
values in the unstressed condition. This leads to the possibility of cofitness between 
genes that are important for fitness in every condition but might not have any 
functional relationship. However, even genes that are mildly important for fitness 
in most conditions often have different phenotypes in a subset of conditions. For 
example, Extended Data Fig. 1b compares the gene fitness values for S. loihica 
PV-4 growing in LB or in LB with 0.8 mM zinc sulfate. There are 26 genes that 
have fitness < —2 in both conditions, and as you might expect, these genes are 
important for fitness in most of the LB stress experiments. So one might expect 
these genes to give spurious results for cofitness. However, of these 26 genes, 13 
are significantly detrimental to fitness in some condition (fitness > 0). Even the 
remaining 13 genes seem to be important in most, but not all conditions. For exam- 
ple, Shew_1093 (peroxide stress resistance protein yaaA) is important for fitness in 
most conditions but has little or no defect with a-ketoglutarate as the carbon source 
(two replicates, fitness = —0.3 and —0.4). Of the 13 genes that lack a detrimental 
phenotype, none has high-scoring cofitness (r > 0.8) with any gene in S. loihica 
PV-4. Four of these genes did meet our criteria for conserved cofitness, and three 
of the four cases are biologically plausible (ruvC with recG, which is also involved 
in recombinational DNA repair; recG with ruvC; and gspL with other gsp genes). 
More broadly, across 30 bacteria there are 1,797 protein-coding genes in our data 
set that are important for fitness in our base medium (average fitness < —1). (For 
the two other bacteria, we do not have fitness data from the base medium.) Of those 
1,797 genes, 758 have significant detrimental phenotypes in other conditions, so 
we do not expect spurious cofitness. Of the remaining 1,039 protein-coding genes, 
just 343 (33%) met our criteria for a functional association based on cofitness. As 
illustrated by our analysis of the four such cases in PV-4, we expect that many of 
these associations are genuine. Therefore, we believe that very few of the thousands 
of cofitness associations we identified are due to functionally unrelated gene pairs 
with consistently deleterious phenotypes. 
Comparison to other ways of computing cofitness. A related caveat in interpreting 
the cofitness values is that, if some genes are moderately important for fitness in 
every condition, they will have high cofitness owing to the varying number of 
generations in the fitness assays, even if there is no functional relationship. This 
problem could be avoided if the fitness values were divided by the number of 
generations for which the cells grew, but we prefer not to rescale the fitness values 
in this way for several reasons. First, it is not clear what the rescaling should be 
for motility or survival experiments. Second, we expect that the rescaling would 
amplify the experimental noise in experiments with fewer generations. Third, any 
genes whose mutants have strong fitness defects should be absent from our data set 
anyway, as these mutants should be at very low abundance after the construction 
of the mutant library and outgrowth from the freezer. 
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We tested the impact of rescaling the fitness values before computing 
cofitness for six bacteria (C. crescentus N1000, E. vietnamensis, D. japonica 
UNC79MFTsu3.2, K. michiganensis M5al, P. actiniarum, and Pedobacter sp. 
GW460-11-11-14-LB5). For these bacteria, we have accurate measurements of 
the total number of generations of growth for every successful fitness assay. Across 
these six bacteria, if we consider the 2,905 query genes whose most-cofit gene had 
cofitness above 0.8 (without rescaling), then 95% of these pairs had cofitness above 
0.8 after rescaling. Also, in all but three cases, the top cofitness hit was still in the 
top five hits after we rescaled the cofitness values. The distributions of cofitness 
values for functionally related gene pairs (same TIGR subrole) or unrelated pairs 
(different subrole) were also very similar with or without rescaling: for each organ- 
ism and each set of gene pairs, the Kolmogorov-Smirnov distance (D) between the 
two distributions was between 0.01 and 0.02. (D is a non-parametric measure of the 
dissimilarity between two distributions and ranges from 0 for identical distribu- 
tions to 1 for distributions that do not overlap.) Overall, the cofitness values were 
very similar regardless of whether the fitness values were scaled by the number of 
generations or not (Extended Data Fig. 10a). 

Also note that we compute cofitness across all experiments, without averaging 
the replicates. To quantify the impact of averaging the biological replicate fitness 
experiments on the cofitness values, we computed, for each of the 32 bacteria, the 
Kolmogorov-Smirnov distance (D) between the distribution of cofitness values 
for functionally unrelated genes (with different TIGR subroles) as computed with 
replicates averaged or not. For all 32 bacteria, D < 0.08. In other words, the distri- 
butions of cofitness are quite similar whether or not the biological replicates are 
averaged. We also verified that the vast majority of the top cofitness hits were still 
high-scoring hits if we averaged the replicates before computing cofitness. Of the 
top cofitness hits with r > 0.8, 95% were in the top five hits and above 0.8 if we 
used replicate-averaged cofitness. Another 3% of these cases had replicate-averaged 
cofitness in the top five hits and between 0.7 and 0.8. Overall, the cofitness values 
were very similar regardless of whether replicate experiments were averaged or not 
(Extended Data Fig. 10b). In our data set, 99% of bacterium-condition—concen- 
tration combinations have 1-5 replicates, which may explain why averaging the 
replicates did not have much effect on the cofitness values. 

Functional associations and conserved functional associations. In a single 
bacterium, a gene has a functional association if it has a specific phenotype or high 
cofitness with another gene. A gene has a conserved functional association if it has 
a conserved specific phenotype or conserved cofitness. 

‘Predicting TIGR subroles from cofitness or conserved cofitness. We considered 
only cofitness hits that were in the top 10 hits for a query gene, only hits with cofit- 
ness above 0.4 (or conserved cofitness above 0.4), and only hits that were at least 
20kb from the query gene. For this analysis, we did not consider nearby genes in 
the genome because (1) our data have subtle biases from chromosomal position’, 
which can inflate cofitness*’; (2) nearby genes may be in the same operon and may 
be spuriously cofit because of polar effects (that is, when a transposon insertion in 
an upstream gene of an operon blocks transcription of the downstream genes); and 
(3) functional relationships between nearby genes can be identified by comparative 
genomics*® and may be less useful. We predict that the gene has the same subrole 
as the best-scoring hit. When testing cofitness, the hits with the highest cofitness 
were considered first. When testing conserved cofitness, the hits with the highest 
conserved cofitness (as defined above) were considered first. 

Genome and gene annotations. For previously published genomes, gene anno- 
tations were taken from MicrobesOnline, Integrated Microbial Genomes (IMG), 
or RefSeq. Newly sequenced genomes were annotated with RAST, except that S. 
koreensis DSMZ 15582 was annotated by IMG and P fluorescens FW300-N2C3 and 
P. fluorescens FW300-N2E3 were annotated by the NCBI pipeline. See Supplementary 
Table 22 for a summary of genome annotations and their accession numbers. 
Statistical analysis. For Fig. 2b, we chose a sample size of 40 genes with non- 
conserved phenotypes to limit the manual effort for the analysis. We expected a 
large difference in the proportions such as 80% versus 40%, and the power to detect 
such a difference was over 98% for a=0.05. 

Classification of how informative annotations are. To assess the existing com- 
putational annotations for the 32 genomes, we classified all of their proteins into 
one of four groups: detailed TIGR role, hypothetical, vague, or other detailed. 
(1) ‘Detailed TIGR role’ includes proteins that belong to a TIGRFAM role other 
than “Unclassified; ‘Unknown function; or ‘Hypothetical proteins’ and had a sub- 
role other than ‘Unknown substrate, “Two-component systems, ‘Role category 
not yet assigned, ‘Other’ ‘General, ‘Enzymes of unknown specificity, ‘Domain, 
or ‘Conserved. (2) ‘Hypothetical’ includes proteins whose annotation contained 
the phrase ‘hypothetical protein, ‘unknown function; ‘uncharacterized; or if the 
entire description matched “TIGRnnnnn family protein’ or ‘membrane protein. 
(3) Proteins were considered to have ‘vague’ annotations if the gene description 
contained ‘family; ‘domain protein, ‘related protein, ‘transporter related; or if the 
entire description matched common non-specific annotations (‘abc transporter 
atp-binding protein, ‘abc transporter permease; ‘abc transporter substrate-binding 
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protein, ‘abc transporter, ‘acetyltransferase’ ‘alpha/beta hydrolase, ‘aminohydrolase, 
‘aminotransferase, ‘atpase,, ‘dehydrogenase, ‘dna-binding protein, ‘fad-dependent 
oxidoreductase, ‘gcn5-related n-acetyltransferase; ‘histidine kinase, ‘hydrolase’ 
‘lipoprotein, ‘membrane protein, ‘methyltransferase’, ‘mfs transporter’ ‘oxidore- 
ductase; ‘permease; ‘porin, ‘predicted dna-binding transcriptional regulator, 
‘predicted membrane protein, ‘probable transmembrane protein, ‘putative mem- 
brane protein; ‘response regulator receiver protein, ‘rnd transporter’ ‘sam-dependent 
methyltransferase; ‘sensor histidine kinase’ ‘serine/threonine protein kinase; ‘signal 
peptide protein, ‘signal transduction histidine kinase; ‘tonb-dependent receptor, 
‘transcriptional regulator; ‘transcriptional regulators, or ‘transporter’). The remain- 
ing proteins were considered to have ‘other detailed’ annotations. 

To identify a subset of the proteins annotated as ‘hypothetical’ or ‘vague’ that do 
not belong to any characterized families, we relied on Pfam and TIGRFAM. A Pfam 
family was considered to be uncharacterized if its name began with either DUF 
or UPF (which is short for ‘uncharacterized protein family’). A TIGRFAM family 
was considered to be uncharacterized if it had no link to a role or if the top-level 
role was ‘Unknown function. To identify poorly annotated proteins from diverse 
bacteria (for Extended Data Fig. 9), we used the rules for vague annotations only. 
Protein re-annotation. We manually examined the genes with specific- 
important phenotypes (fitness < —1 and met criteria for a specific phenotype) 
on carbon sources in 25 bacteria (we excluded C. crescentus NA1000, D. japonica 
UNC79MFTsu3.2, E. vietnamensis, H. seropedicae SmR1, K. michiganensis Modal, 
Pedobacter sp. GW460-11-11-14-LB5, and P. actiniarum). We focused on genes 
with strongly important phenotypes (fitness < —2) whose annotations (in SEED 
and/or KEGG) did not imply a role in using that nutrient. In some cases, we also 
tried to identify the complete catabolic pathway and used cofitness to find addi- 
tional proteins that were involved but did not meet the threshold for a specific 
phenotype. For putative transporters, we also considered strong specific-important 
phenotypes (fitness < —2) on nitrogen sources. The primary tools that we used 
were the Fitness Browser (which includes Pfam!”, TIGREAM!°, SEED”?, and 
the public release of KEGG**), MetaCyc®, PaperBLAST™, and the Conserved 
Domains Database (CDD)™. (We used SEED/RAST only for the initial annotation 
of a subset of the genomes in this study, but the Fitness Browser stores the SEED/ 
RAST annotation of every protein sequence.) For proteins that appeared to lack 
correct annotations in the public release of KEGG, we checked the KEGG website 
to see whether the annotation had been updated. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data and code availability. The Fitness Browser (http://fit.genomics.lbl.gov) 
provides access to information from the successful fitness experiments, including 
details of the experimental conditions, quality metrics for each experiment, per- 
strain fitness values, gene fitness scores, and t values. This Fitness Browser is 
available at http://fit.genomics.lbl.gov and is archived at https://doi.org/10.6084/ 
m9.figshare.5134840. (This archive includes a SQLite relational database, a BLAST 
database of protein sequences, and tab-delimited files.) The Fitness Browser also 
contains the original gene annotations for each of the 32 bacteria we studied; the 
predicted protein sequences for the annotated genes; the results of various com- 
parative genomics analyses (PFam, TIGRFam, SEED/RAST, and KEGG); and 
re-annotations based on the fitness data (Supplementary Note 5 and Supplementary 
Table 12). The Fitness Browser also includes functional associations, but as more 
fitness experiments are conducted, the functional associations in the Fitness 
Browser may diverge from the analyses in this manuscript. 

Most analyses in this manuscript are available from http://genomics.Ibl.gov/ 
supplemental/bigfit (archived at https://doi.org/10.6084/m9.figshare.5134837). 
The R image contains all of the information in the Fitness Browser except for the 
per-strain fitness values. The R image also reports: how many insertions were 
found in each protein-coding gene; the likely-essential genes; metadata and quality 
metrics for the unsuccessful experiments; gene fitness values from control (start) 
experiments; which genes had statistically significant phenotypes; the conserved 
functional associations; the classification of how informative each protein's anno- 
tation is; the identifiers and BLAST scores for proteins with hypothetical or vague 
annotations from diverse bacteria; and the cofitness of pairs of genes with the same 
TIGR subroles or different TIGR subroles. The website and the archive also include: 
genome sequences; the mapping between the DNA barcode and the insertion 
location for each pool of transposon mutants; and which mutant strains were used 
for computing gene fitness. The analysis of cofitness values based on scaling by 
the number of generations (or not) is archived separately (https://doi.org/10.6084/ 
m9.figshare.5146309.v1). 

The BarSeq or TnSeq reads were analysed with the RB-TnSeq scripts (https:// 
bitbucket.org/berkeleylab/feba); we used statistics versions 1.0.3, 1.1.0, or 1.1.1 of 
the code; and version 32 x 1 of BLAT®. The R image was derived from these results 
using R code that is included in the archive. 

All genomes sequenced for this study have been deposited in GenBank under 
the following accession numbers: Acidovorax sp. GW101-3H11 (LUKZ01000000), 


P, fluorescens FW300-N1B4 (LUKJ01000000), P. fluorescens FW300-N2C3 
(CP012831.1), P fluorescens FW300-N2E2 (SAMN03294340), P. fluorescens 
FW300-N2E3 (CP012830.1), P fluorescens GW456-L13 (LKBJ00000000.1) and 
S. koreensis DSMZ 15582 (PGEN01000001.1). 

Strain availability. Most of the wild-type bacteria (25 of 32), including the eight 
new isolates from the ORNL FRC, are available from public stock centres. The eight 
new isolates are available from DSMZ under the following accession numbers: 
Acidovorax sp. GW101-3H11 (DSM 106133), C. basilensis 4G11 (DSM 106286), 
Pedobacter sp. GW460-11-11-14-LB5 (DSM 106514), P fluorescens FW300-N1B4 
(DSM 106120), P. fluorescens FW300-N2C3 (DSM 106121), P. fluorescens FW300- 
N2E2 (DSM 106119), P._ fluorescens FW300-N2E3 (DSM 106124) and P fluorescens 
GW456-L13 (DSM 106123). The other seven bacteria were gifts from individuals 
(Supplementary Table 14). Wild-type bacteria, mutant libraries, and the individual 
mutant strains that we generated for follow-up studies are available upon request. 
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Nitrogen source utilization 
in Azospirillum brasilense sp. 245 
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Extended Data Fig. 1 | Examples of nitrogen source and stress fitness 
experiments. a, The utilization of p-alanine or cytosine by Azospirillum 
brasilense Sp245. Each point shows the fitness of a gene in the two 
conditions. The data are the average of two biological replicates for each 
nitrogen source. Amino acid synthesis genes were identified using the 
top-level role in TIGRFAMs. The genes for p-alanine utilization were a 
p-amino acid dehydrogenase (AZOBR_RS08020), an ABC transporter 
operon (AZOBR_RS08235:RS08260), and a LysR family regulator 
(AZOBR_RS21915). The genes for cytosine utilization were cytosine 
deaminase (AZOBR_RS31895) and two ABC transporter operons 
(AZOBR_RS06950:RS06965 and AZOBR_RS31875:RS31885). 
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Zinc stress in 
Shewanella loihica PV—4 
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b, Zinc stress in S. loihica PV-4. We compare fitness in rich medium with 
added zinc (II) sulfate to fitness in plain rich medium. The LB data are 

the average of two biological replicates. The highlighted genes include a 
putative heavy metal efflux pump (CzcCBA or Shew_3358:Shew_3356), a 
hypothetical protein at the beginning of the czc operon (CzcX), a 
zinc-responsive regulator (ZntR or Shew_3411), and another heavy 

metal efflux gene related to arsP or DUF318 (Shew_3410). CzcX lacks 
homology to any characterized protein, but homologues in other strains of 
Shewanella are also specifically important for resisting zinc stress. In both 
panels, the lines show x=0, y=0, and x=y. 
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Phenotypes vs. Types of Genes 
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Extended Data Fig. 2 | Phenotypes versus types of genes. We categorized _to fitness (fitness > 0). Genes with high or moderate similarity to another 


proteins in our data set by their type of annotation or by whether they gene in the same genome (paralogues with alignment score above 30% 
have homologues in the same genome (‘paralogues’). For each category, we _ of the self-alignment score) were less likely to have a phenotype (25% 
show the fraction of genes that have statistically significant phenotypes, versus 32%, P< 10°, Fisher’s exact test), which is likely to reflect genetic 
and more specifically the fractions that have strong phenotypes redundancy. 


(|fitness| > 2 and statistically significant) or are significantly detrimental 
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Extended Data Fig. 3 | Known DNA repair genes are important for 
cisplatin resistance. We compared the growth of a gene deletion strain 
and the wild-type bacterium under varying cisplatin concentrations. We 
show all replicate growth curves for each genotype. We believe the higher 
overall growth for some of the wild-type experiments (for example, top 
middle) is random. We observe this phenomenon consistently for some 
bacteria and we speculate that this is due to varying oxygen content across 


the microplate. a, E. coli radD (n= 6 independent experiments per strain). 
b, D. shibae Dshi_2244 (n= 3 independent experiments for wild-type and 
n= 6 independent experiments for the mutant). c, Phaeobacter inhibens 
PGA1_c08960 (n= 4 independent experiments for wild-type and n=6 
independent experiments for the mutant). Dshi_2244 and PGA1_c08960 
are orthologues of MmcB (DUF1052) from C. crescentus”4. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


ARTICLE 


0 mg/ml cisplatin 


0.05 mg/ml cisplatin 


0.1 mg/ml cisplatin 


se N Oe Ra N : N . 
- — Wild type - — Wild type r — Wild type 
— cycA- — cycA- —- cycA- 
° — endA- Off Bees ° — endA- i} — endA- 
e™ -- 95% ci. a -- 95% ci. aiaaa er -- 95% ci. 
c c ¢ 
On Oa 20 
BS BS BS 
> > > 
$6) jj 9 fows<<-*°" Bo Zo 
a” Be a 
83 83 83 
a Q Q 
[e) [e) ie) 
N N N 
Oo Oo Oo 
° ° iad 
oO Oo Oo 
0 5 10 15 0 5 10 15 
Hours Hours Hours 
b No cisplatin 0.031 mg/ml cisplatin 0.062 mg/ml cisplatin 
ba! x + — wild type 
— $04008- 
N N nN 
e e" =" 
22 22 2° 
or or or 
e s 2; 
Be Be Be 
DO DO HO 
Cc = i= 
ae ae ae 
=O —-O —O 
3. Em Em 
6S oS oS 
N ; N : N 
° — wild type ° — wildtype ° 
— $04008- — $04008- 
tod tod ° 
Oo Oo Oo 
0 10 20 30 40 0 10 20 30 40 0) 10 20 30 40 
Hours Hours Hours 
c No cisplatin 0.031 mg/ml cisplatin 
e ° — wild type 
<= = — Psest_2235- 
Psest_1636-— 
ES ES 
fo} lo} 
fo} Oo 
oO oO 
SO SO 
& So e fo} 
=~ c 
oO oO 
Og Ove 
BS BS 
& & 
BY — wild type By 
— Psest_2235- 
Psest_1636- 
Oo fo} 
ro) ro) 
0 10 20 30 40 0 10 20 30 40 
Hours Hours 


Extended Data Fig. 4 | EndA, DUF3584, and a FAN1-like VRR-NUC 
domain protein are important for cisplatin resistance. As in Extended 
Data Fig. 3, comparing cisplatin sensitivity of a gene deletion mutant to 
the wild-type bacterium. a, E. coli endA knockout. cycA encodes an amino 
acid transporter and is not expected to have a phenotype on cisplatin 

and is used as a control. Each growth curve is the average of 12 replicate 


wells and the dashed lines show 95% confidence intervals from the f-test. 
b, A deletion of S. oneidensis MR-1 SO4008, a member of the DUF3584 
protein family (n =6 independent experiments per strain). c, A deletion of 
P. stutzeri RCH2 Psest_2235 (n =4 independent experiments per strain). 
Psest_1636 is not expected to be involved in DNA repair and is used here 
as a control. Psest_2235 is a FAN1-like VRR-NUC domain protein”. 
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Extended Data Fig. 5 | The nuclease domain of EndA is important for 
cisplatin resistance. We assayed the growth of an E. coli endA— Keio 
collection deletion mutant carrying one of three different vectors: an 
empty vector with no insert (endA— + empty), a complementation 
vector carrying a wild-type copy of endA (endA— + endA), anda 
complementation vector with a mutant version of endA with an alanine 
at position 84 instead of histidine (endA— + mutant endA). A mutation 
of this conserved histidine residue in a close homologue from Vibrio 
vulnificus has been reported to eliminate nearly all nuclease catalytic 
activity®. As a control, we assayed the wild-type, parental E. coli strain 
carrying a vector with no insert (wt-+ empty). We performed these growth 


Hours Hours 


assays on three separate microplates (Plate #1, #2, #3). n=3 independent 
experiments per strain in Plate #1; n = 4 independent experiments per 
strain in Plates #2 and #3. We added 201g ml! gentamicin to each assay 
to maintain selection for the plasmids (pBBR1-MCS5 and derivatives). 
Although the catalytic activity of EndA (endonuclease I) appears to 

be important for resisting cisplatin, it is not clear how EndA would be 
involved in DNA repair if it is located in the periplasm, as previously 
believed®°”. We speculate that EndA relocates to the cytoplasm upon 
DNA damage or that EndA degrades broken DNA that enters the 
periplasm and would otherwise damage the membrane. 
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Extended Data Fig. 6 | Members of protein family UPF0126 are (n=8 independent experiments for wild-type and n= 16 independent 
important for growth on glycine. Growth comparison of gene deletion experiments for the mutant). c, Psest_1636 from P. stutzeri RCH2, with 
mutants in UPF0126 versus wild-type bacteria in minimal defined either ammonium chloride (n = 4 independent experiments per strain) 
medium. a, $O1319 from S. oneidensis MR-1, with either ammonium or glycine (n = 8 independent experiments per strain) as the sole source 
chloride (n = 6 independent experiments per strain) or glycine as the of nitrogen. The Psest_2235 deletion strain is used as a control and is not 
sole source of nitrogen (n = 12 independent experiments per strain). expected to have a phenotype in these conditions. 


b, PGA1_c00920 from P. inhibens, with glycine as the sole source of carbon 
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Extended Data Fig. 7 | PGA1_c00920 partially rescues the glycine 
growth defect of an E. coli cycA mutant. CycA is a glycine transporter 
from E. coli and a mutant in this gene has reduced uptake of glycine*’. 

We investigated whether a member of the UPF0126 protein family could 
rescue the glycine growth defect of an E. coli cycA deletion strain. We 
introduced different plasmids into the E. coli cycA Keio collection deletion 
background: an empty plasmid with no insert (cycA— + empty), a plasmid 
with a wild-type allele of the E. coli cycA gene (cycA— + cycA), anda 
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plasmid with PGA1_c00920 from P inhibens (cycA— + PGA1_c00920). 
We compared the growth of these strains and a wild-type E. coli control 
(wt + empty) in defined media with either ammonium chloride (n =2 
independent experiments per strain) or glycine as the sole source of 
nitrogen (n = 4 independent experiments per strain). PGA1_c00920 
partially rescues the glycine-specific growth defect of the cycA— deletion 
strain. 
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Extended Data Fig. 8 | Overexpression of members of protein family the growth of these strains in LB at 30 °C with varying concentrations of 
UPF0060 confers resistance to thallium. We introduced three plasmids thallium(I) acetate (n = 6 independent experiments per strain). We added 
into wild-type E. coli: a plasmid control with no insert (Empty vector), a 50g ml! kanamycin to each assay to maintain selection for the plasmids 
plasmid carrying RR42_RS34240 from C. basilensis 4G11, and a plasmid (pFAB2286 and derivatives). RR42_RS34240 and Pf6N2E2_2547 are 
carrying Pf6N2E2_2547 from P fluorescens FW300-N2E2. We assayed members of the UPF0060 protein family. 
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Extended Data Fig. 9 | Relevance to all bacteria. We selected 2,593 
hypothetical or vaguely annotated proteins from diverse bacterial species, 
compared them to the protein-coding genes for which we have fitness data 
(using protein BLAST), and identified potential orthologues as best hits 
that were homologous over at least 75% of each protein’s length. We show 
the fraction of these proteins that have a potential orthologue with each 
type of phenotype and that is above a given level of amino acid sequence 
similarity. Similarity was defined as the ratio of the alignment’s bit score to 
the score from aligning the query to itself. 
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Extended Data Fig. 10 | Alternative ways of computing cofitness.a,The different TIGR subroles, were more than 20 kB apart, and had fitness data 
effect of rescaling the cofitness values by the number of generations in (1,559-8,881 pairs per bacterium). For each pair, we compared the original 
six bacteria. For each of the six bacteria, we identified all pairs of protein- cofitness values to the rescaled cofitness (computed from fitness values 
coding genes that were assigned to the same TIGR subrole, were more that were divided by the number of generations). b, The effect of averaging 
than 20 kB apart, and had fitness data. This gave 1,711-9,406 pairs per fitness scores from replicate experiments on the cofitness values. 


bacterium. We also selected a random subset of pairs that were assigned to 
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ANKRD16 prevents neuron loss caused 
by an editing-defective tRNA synthetase 


My-Nuong Vo!’, Markus Terrey?**>!”, Jeong Woong Lee*’, Bappaditya Roy®’, James J. Moresco®, Litao Sun!, Hongjun Fu 
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Qi Liu®”?!®, Thomas G. Weber’®, John R. Yates III®, Kurt Fredrick®’, Paul Schimmel!"!8* & Susan L. Ackerman??4>-18* 


Editing domains of aminoacyl tRNA synthetases correct tRNA charging errors to maintain translational fidelity. 
A mutation in the editing domain of alanyl tRNA synthetase (AlaRS) in Aars* mutant mice results in an increase in 
the production of serine-mischarged tRNA“" and the degeneration of cerebellar Purkinje cells. Here, using positional 
cloning, we identified Ankrd16, a gene that acts epistatically with the Aars*" mutation to attenuate neurodegeneration. 
ANKRDI16, a vertebrate-specific protein that contains ankyrin repeats, binds directly to the catalytic domain of AlaRS. 
Serine that is misactivated by AlaRS is captured by the lysine side chains of ANKRD16, which prevents the charging 
of serine adenylates to tRNA“" and precludes serine misincorporation in nascent peptides. The deletion of Ankrd16 in 


the brains of Aars*/sti 


mice causes widespread protein aggregation and neuron loss. These results identify an amino- 


acid-accepting co-regulator of tRNA synthetase editing as a new layer of the machinery that is essential to the prevention 


of severe pathologies that arise from defects in editing. 


Accurate aminoacylation of tRNAs by tRNA synthetases establishes 
the universal genetic code and occurs in two steps: the activation of 
the selected amino acid with ATP to form an aminoacy] adenylate, 
and the transfer of the aminoacyl group of the adenylate to the 2’- or 
3/-OH of the cognate tRNA. However, structural similarities between 
some amino acids allow for misactivation of non-cognate amino acids 
and subsequent misacylation of tRNA. These errors can be corrected 
by the hydrolytic editing functions found in many tRNA synthetases’. 
Editing can occur after amino acid misactivation but before aminoacyl 
transfer, or after the aminoacylation of tRNA, known as pre-transfer 
and post-transfer editing, respectively. Pre-transfer editing occurs at 
either the active site of aminoacylation or at a distinct editing site, 
whereas post-transfer editing occurs only at the editing site of these 
enzymes? °. 

The importance of tRNA synthetase editing, including editing 
by the alanyl tRNA synthetase (AlaRS, which is encoded by Aars in 
mice), has been shown in several organisms*'!. Although AlaRS mis- 
activates both glycine and serine, the misactivation of serine seems 
to have more serious consequences, perhaps due to the presence of 
D-aminoacyl-tRNA deacylases!?"“ that act on Gly-tRNA“"™. Previously 
we showed that sticky (sti) mutant mice that have a point mutation 
(A734E) in the editing domain of AlaRS have ubiquitinated protein 
aggregates in cerebellar Purkinje cells, resulting in the degeneration of 
these neurons’. The sti mutation results in only a twofold increase in 
the generation of Ser-tRNA“", demonstrating that Purkinje cells are 
particularly sensitive to the loss of AlaRS editing”. 

Here we identified a new vertebrate-specific gene, Ankrd16, which 
greatly attenuates misfolded aggregate formation and Purkinje cell 
degeneration in Aars‘/" mice. The deletion of Ankrd16 in other 
Aars*/" neurons also caused the formation of ubiquitinated protein 


aggregates and neuron death. ANKRD16 binds to AlaRS and prevents 
the misincorporation of serine at alanine codons in nascent peptides 
by stimulating serine-dependent ATP hydrolysis before tRNA amino- 
acylation via the acceptance of misactivated serines. Our data collec- 
tively reveal ANKRD16 as a co-regulator of AlaRS that protects against 
assaults on translation fidelity and proteostasis in mammalian neurons. 


Ankrd16 suppresses Purkinje cell loss in sticky mice 
Aars*!" mice on the inbred C57BL/6] (B6) genetic background are 
ataxic, with cerebellar Purkinje cell degeneration beginning at three 
weeks of age’. However, neither ataxia nor Purkinje cell degeneration 
was apparent in nine out of ten 12-16-month-old F2 Aars*/" mice 
from a B6.Aars“’*+ and CAST/Ei (CAST) mating, suggesting that 
CAST-derived alleles could suppress neuron loss. In agreement, two 
phenotypically distinct classes of N, Aars“*" offspring (generated from 
a backcross of F; B6/CAST Aars“" mice to B6.Aars™+ mice) were 
observed at equal frequencies: mice with ataxia and extensive Purkinje 
cell loss similar to that of B6.Aars*/* mice, and mice without ataxia 
and little Purkinje cell loss (Extended Data Fig. 1a). Suppression of the 
neurodegenerative phenotype was observed in 50% of backcross mice 
from crosses to CASA/RkJ (CASA, a strain closely related to CAST/Ei), 
but not in other crosses, which suggests that a single dominant allele 
from the CAST or CASA strains can suppress sti-mediated neurode- 
generation (Extended Data Fig. 1a). 

Genome scans of affected and unaffected N, Aars“" mice revealed 
that the Purkinje cell degeneration was suppressed in mice that were 
heterozygous for CAST alleles on proximal chromosome 2 (Extended 
Data Fig. 1b). This locus (around 3.3 Mb) also segregated with the 
Aars*"'" modifier gene from CASA, which suggests that these strains 
share the same suppressor gene. Transfer of the modifier of sticky 
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Fig. 1 | Modifier of sticky (Msti) suppresses Aars‘-mediated 
neurodegeneration. a, Results of rotarod experiments on three- 
month-old mice. Data are mean + s.d., n= 15 or 16 mice per genotype, 
two-way analysis of variance (ANOVA; Tukey correction), **P < 0.01, 
***P < 0.001. b, Calbindin D-28 immunohistochemistry images of the 
indicated genotypes. Sections were counterstained with haematoxylin. 
n=5 biological replicates. c, The percentage of Purkinje cells with 
aggregates in mice at different ages. Bars and error bars are mean +s.d., 
n= 3, black circles represent individual data points. Multiple t-tests 
(Holm-Sidak method), **P = 0.00148 (6 weeks), P= 0.00498 (12 weeks), 
ee PD — 7.819 x 10% Scale bars, 500 jum. 


(Msti) locus from the CAST genome onto the B6. Aars*/s# genetic 
background rescued the decreased latency to fall in a rotarod test of 
three-month-old B6.Aars“’" mice and markedly reduced Purkinje cell 
death, although some neurons in the rostral cerebellum of these mice 
still degenerated (Fig. 1a, b). Heterozygosity and homozygosity for 
CAST alleles in the Msti region similarly attenuated Purkinje cell loss, 
in agreement with the dominant nature of Msti (Fig. 1b). 


Ubiquitin punctae characteristic of protein inclusions were observed 
in many Purkinje cells in four-week-old B6.Aars*!" mice, but not in 
B6 or CAST mice (data not shown). As expected owing to the pro- 
gressive Purkinje cell loss observed in sti mutant mice, the number of 
Purkinje cells that contained inclusions decreased by 6 and 12 weeks 
of age. However, very few Purkinje cells in B6.Msti-*$ 1° Aars*!" mice 
contained inclusions, even at later ages, which confirms the long-term 
protection of Purkinje cells (Fig. 1c). 

We further localized Msti to a 0.63-Mb region, and the coding 
regions of the protein-coding genes in this interval were amplified by 
PCR with reverse transcription (RT-PCR) from B6 and CAST cerebel- 
lar RNA and sequenced (Fig. 2a). Non-synonymous single nucleotide 
polymorphisms (SNPs) were observed in I/2ra, Fbxo18 and Ankrd16 
(Fig. 2a). However, with the exception of I/2ra, these SNPs were either 
not present in CASA (Fbxo18), which is an Aars’“ modifying strain, 
or were also present in MOLF (Ankrd16), a strain in which neurode- 
generation is not rescued (Extended Data Fig. 1a, c). 

A strain-specific Ankrd16 transcript that contained a 138-bp cryptic 
exon (exon 5’) was also observed in B6 (with or without the Aars*!/* 
mutation) and other non-rescuing strains, but was absent in CAST or 
CASA (Fig. 2b, Extended Data Fig. 1d, e). Sequence analysis of intron 
5 suggested that the inclusion of this exon was due to an SNP that gen- 
erated a novel alternative splice site (Fig. 2c, Extended Data Fig. 1f). 
Transcript levels of the correctly spliced Ankrd16 isoform (lacking 
exon 5’) were 3.3-fold and 5.3-fold higher in the congenic MstiS1/°° 
or CAST cerebellum relative to B6, respectively (Fig. 2d). Protein levels 
were 2.8 + 0.2-fold and 3.9 +0.22-fold higher in the B6.Msti©4S"/8° and 
CAST cerebellum, respectively, and the Aars*" mutation did not alter 
Ankrd16 splicing or protein levels (Fig. 2e, Extended Data Fig. le, g). 
Increased ANKRD16 levels were also observed in other tissues from 
MstiC’S1/®6 mice (Extended Data Fig. 1h). The Ankrd16 transcript with 
exon 5’ contains a premature stop codon and is predicted to undergo 
nonsense-mediated decay; in agreement, truncated forms of ANKRD16 
were not observed (Fig. 2e, Extended Data Fig. 1g). 


Fig. 2 | Ankrd16 is the modifier of Aars*t/*". 
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Fig. 3 | ANKRD16 interacts with AlaRS and prevents serine-mediated 
cell death. a, Co-IP experiments and western blotting reveal the 
AlaRS-ANKRD 16 interaction in the brain. n = 4 biological replicates. 

b, Co-IP experiments in HEK293T cells using mouse Myc-AlaRS domain 
constructs. n =2 independent experiments. c, Pull-down of purified 
AlaRS, TyrRS or TrpRS with ANKRD 16. n= 2 independent experiments. 
d, Cell death of embryonic fibroblasts. Data are mean + s.d., n = 3, two- 
way ANOVA (Tukey correction), ***P < 0.001, ****P < 0.0001). e, 
AlaRS(C666A/Q584H) E. coli expressing mouse Ankrd16 or Ankrd29. 

n = 3 independent experiments. aa, amino acids; AAD, aminoacylation 
domain; ED, editing domain; IB, immunoblot; IP, immunoprecipitation. 


Neurodegeneration in Aars‘/*" mice was modified neither by 


transgenic expression of the CAST I/2ra cDNA nor by deletion of 
Il2ra (data not shown). However, suppression of Aars‘“!*-mediated 
Purkinje cell loss was observed in mutant mice that carried a CAST 
BAC transgene containing the 3’-portion of the Il15ra gene, and the 
Fbxo18 and Ankrd16 genes (Fig. 2f). However, another transgenic line 
(Tg25L9-19) generated with this BAC in which Ankrd16 was deleted 
upon integration was not able to suppress Purkinje cell degeneration 
(Fig. 2f, Extended Data Fig. 1i). Indeed, transgenic expression of the 
Ankrd16“S" coding sequence using the Purkinje cell-specific pro- 
moter Pcp2 in B6.Aars*/*# mice suppressed the death of these neurons 
(Fig. 2g). 


ANKRDI16 binds AlaRS to prevent mistranslation 

Ankrd16 encodes a 39-kDa protein of unknown function that is 
composed of nine repeats of the ankyrin protein-protein inter- 
action domain and is found only in vertebrates (Extended Data 
Fig. 2a, Extended Data Fig. 1j). To identify proteins that interact 
with ANKRD 16, we transgenically expressed Ankrd16-Myc under 
the control of the chicken beta-actin promoter and cytomegalovirus 
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enhancer!»®, Co-immunoprecipitation (co-IP) of ANKRD 16 was per- 
formed using livers of transgenic and non-transgenic mice, and the 
immunoprecipitates were analysed by liquid chromatography-tandem 
mass spectrometry (LC-MS/MS). Notably, AlaRS was the most abun- 
dant protein after the ANKRD 16 bait protein (Extended Data Fig. 2a). 
Interaction of ANKRD16 with wild-type or mutant AlaRS in both 
the liver and the brain was confirmed by co-IP and western blotting 
(data not shown and Fig. 3a). Interactions between ANKRD16 and 
AlaRS were independent of both the epitope tag and the bait protein 
(Extended Data Fig. 2b, c). 

Co-IP experiments demonstrated that the AlaRS aminoacylation 
domain was efficiently co-immunoprecipitated by ANKRD16, and 
little interaction was observed in the absence of this domain (Fig. 3b, 
Extended Data Fig. 2b). Affinity-capture experiments using purified 
AlaRS protein further demonstrated that the AlaRS catalytic domain 
was sufficient for interaction with ANKRD16 and that this interaction 
was direct (Extended Data Fig. 2e, Fig. 3c). Furthermore, AlaRS did 
not co-immunoprecipitate ANKRD29—a protein that comprises eight 
ankyrin repeats (Extended Data Fig. 2d)—nor did ANKRD 16 inter- 
act with tyrosyl-tRNA synthetase or tryptophanyl-tRNA synthetase 
(Fig. 3c). This suggests a specific ANKRD16-AlaRS interaction, with 
similar binding dynamics for both wild-type AlaRS and AlaRS(A734E) 
(Extended Data Fig. 2f). 

The AlaRS(A734E) mutation leads to increased death of Aars*/# 
embryonic fibroblasts when cultured with increasing concentrations of 
serine”. Msti@S"/8* Aas" fibroblasts were more resistant to high serine 
concentrations than were B6.Aars“’*" fibroblasts; at the highest concen- 
trations (40 mM), less cell death was observed in Msti©4S1/®*Aarsst/s# 
than in B6 (Msti®*'®*) fibroblasts, which demonstrates that Ankrd16 
suppresses serine-mediated cell death in Aars*" cells (Fig. 3d). 

Although ANKRD16 is a vertebrate-specific protein , mammalian 
and Escherichia coli AlaRS are highly conserved with around 67% sim- 
ilarity in the catalytic domain. Indeed, mouse ANKRD16 was able to 
bind E. coli AlaRS in affinity-capture experiments (Fig. 3c), and reduced 
the death of E. coli with a severe editing-domain mutation (C666A/ 
Q584H) in AlaRS when grown on a serine gradient (Fig. 3e). Together, 
these results suggest that ANKRD16 influences the function of AlaRS. 

Next, we used dipeptide formation on defined ribosome complexes 
as a read-out of charging and decoding fidelity. These experiments 
sufficiently detected serine mistranslation and AlaRS-mediated editing 
of Ser-tRNA“" (Extended Data Fig. 3a, b). When reactions using Ser- 
tRNA“" and AlaRS were supplemented with alanine, a loss of fMet- 
Ser (fMet, N-formylmethionine) coincided with a gain of fMet-Ala 
(Fig. 4a), which is consistent with the reported post-transfer editing by 
AlaRS’. However, no obvious differences were observed between reac- 
tions using wild-type AlaRS or AlaRS(A734E), and ANKRD16 had no 
effect on either reaction in these experiments using mischarged tRNA. 

By contrast, when deacyl tRNA“! serine and ATP were used, 
AlaRS(A734E) generated more fMet-Ser than did wild-type AlaRS, and 
ANKRD 16 reduced the amount of this aberrant dipeptide in reactions 
with either AlaRS(A734E) or wild-type AlaRS (Extended Data Fig. 3c). 
Although ANKRD 16 interacts with a truncated AlaRS (AlaRS(1-455)) 
(Extended Data Fig. 2f), no reduction in the amount of aberrant dipep- 
tide was observed, which suggests that the editing and/or C-terminal 
domains of AlaRS may contribute to the function of ANKRD16. 
Furthermore, in the presence of ATP and a mixture of serine and 
alanine, ANKRD 16 specifically decreased the formation of fMet-Ser, 
thereby increasing AlaRS(A734E) fidelity (that is, fMet-Ala/fMet-Ser) 
bya factor of around 20 (Fig. 4b). Therefore, ANKRD16 acts before the 
formation of mischarged tRNA to enhance translational fidelity. 

In agreement, ANKRD16 had no effect on subsequent steps of 
tRNA mischarging, including deacylation of mischarged Ser-tRNAA™, 
transfer of the mischarged tRNA to EF-Tu, or amino acid activation 
or aminoacylation of the cognate amino acid alanine (Extended Data 
Fig. 3d-g). In addition to post-transfer editing activity, pre-transfer 
editing has also been observed in AlaRS in E. coli'’. Serine-dependent 
ATPase activity was also observed with mouse wild-type AlaRS in 
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Fig. 4 | ANKRD16 enhances tRNA-independent editing and accepts 
serine adenylates from AlaRS. a, Mouse AlaRS (+ANKRD16) 

was added to reactions containing Ser-tRNA“" and alanine. n=2 
independent experiments. b, Mouse AlaRS (+ ANKRD16) was added to 
reactions containing deacylated tRNA", alanine and serine (* indicates 
oxidized fMet). n = 2 independent experiments. c, Ser-AMP hydrolysis 
at 10 min. Bars and error bars are mean +s.d., n = 3, black circles 
represent individual data points. One-way ANOVA (Tukey correction), 
*** PD < 0.001, ****P < 0.0001. d, Ratio of alanine- or serine-linked 
ANKRDI16 to aminoacylated tRNA“, Bars and error bars are mean +s.d., 
n= 3, black circles represent individual data points. Two-tailed Student’s 


the absence of tRNA (Extended Data Fig. 4a). However, this activity 
was decreased with both AlaRS(A734E) and AlaRS(1-455) (Fig. 4c, 
Extended Data Fig. 4b). ANKRD16 was able to restore that activity 
for AlaRS(A734E), but not for AlaRS(1-455), which further suggests 
that the editing domain contributes to the serine-dependent ATPase 
activity and/or the modulation of this activity by ANKRD16 (Fig. 4c, 
Extended Data Fig. 4b, c). 

Unexpectedly, ANKRD16 appeared to stimulate the tRNA-depend- 
ent precipitation of [*H]serine but not of [*H]alanine in reactions with 
AlaRS(A734E) (Extended Data Fig. 4d-f). However, misactivated PH] 
serine was also precipitated in reactions without tRNA“", with approx- 
imately three linked-serine adducts per ANKRD 16 (Extended Data 
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t-test, **P= 0.0011. e, Ser-AMP hydrolysis at 10 min. Bars and error 
bars are mean + s.d., n =3, black circles represent individual data points. 
One-way ANOVA (Tukey correction), ***P < 0.001). Control reactions 
are from c. f, AlaRS(C666A/Q584H) E. coli expressing mouse ANKRD16 
or ANKRD16(3 xR). n =2 independent experiments. g, Cell death of 
embryonic fibroblasts. Bars and error bars are mean + s.e.m., n = 4, circles 
represent individual data points. One-way ANOVA (Tukey correction), 
#2 D<0.001, ****P < 0.0001. h, Structural model of ANKRD16 and the 
aminoacylation domain (AAD) of AlaRS. Modified lysines are shown 

in purple, active site (AS) residues in orange, and Ala-AMS (Ala-AMP 
analogue) in green. 


Fig. 4g). By contrast, [*H]alanine-adduct formation per ANKRD16 
with AlaRS(A734B), in the absence of tRNA“", was much lower 
(Extended Data Fig. 4h). Furthermore, in the presence of tRNA“? and 
AlaRS(A734E), alanine linked to ANKRD 16 was negligible relative to 
that coupled to tRN. AA (Fig. 4d). 

These results suggest that ANKRD16 functions as an alternative to 
water or tRNA“, in that it accepts misactivated serine. Previous studies 
have demonstrated that reactive aminoacyl adenylates may react with 
cysteine or lysine residues!*-?", Levels of precipitated serine were dimin- 
ished only slightly after prolonged exposure to alkaline pH, which hydro- 
lyses thiol esters (for example, cysteine). As such, the ANKRD16 serine 
adduct is probably an amide (for example, lysine) (Extended Data Fig. 4i). 
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We used mass spectrometry to identify the residues of ANKRD16 
that accept misactivated serine. No spectral shift was observed with 
Cys-containing peptides. Although not all Lys-containing peptides 
of ANKRD 16 were resolved, three highly conserved lysines—K102, 
K135 and K165—shifted by 87 Da, the mass of serine (Extended Data 
Fig. 1j, Extended Data Fig. 5a—d). Therefore, we mutated those lysine 
codons to arginine (ANKRD16(K102R/K135R/K165R, hereafter 
denoted ANKRD16(3 x R)) (Extended Data Fig. 4g). Circular dichro- 
ism spectroscopy and thermal shift assays suggested that wild-type 
ANKRD16 and ANKRD16(3 xR) have similar secondary structures 
and stabilities (Extended Data Fig. 5e, f), and only a slight change 
(approximately twofold) in affinity for AlaRS(A734E) (Extended Data 
Figs. 2f, 5g). In contrast to ANKRD16, ANKRD16(3 x R) had no effect 
on AlaRS(A734E) pre-transfer editing (Fig. 4e, Extended Data Fig. 4b). 
Unlike ANKRD16, which rescued the growth of AlaRS(C666A/ 
Q584H) E. coli at high concentrations of serine, ANKRD16(3 xR) 
did not rescue the growth of mutant bacteria under these con- 
ditions (Fig. 4f). Similarly, the expression of ANKRD16, but not 
ANKRD16(3 xR), prevented the serine-induced death of B6.Aars™/# 
embryonic fibroblasts (Fig. 4g, Extended Data Fig. 6). Accordingly, 
we generated a model for the ANKRD16-AlaRS complex (Fig. 4h). 
Modified lysine side chains of ANKRD 16 project out from the helix- 
loop-helix motifs, with each in close proximity to the active site of 
AlaRS, at which serine is misactivated. 


DAPI 
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Fig. 5 | Loss of Ankrd16 in Aars‘t/s# 

mice causes protein aggregation and 
neurodegeneration. a, b, Immunofluorescence 
with antibodies to ANKRD 16 (green) 

and calbindin D-28 (Calb; red) (a), or 
ubiquitin (Ub; red) and p62 (green) (b). 

n= 3 biological replicates. c, d, Cresyl violet 
staining of hippocampus (c) and cortex (d). 
Biological replicates for c, d: TgCaMKIIa- 
Cre Ankrd16"~ Aars*/s: 1.5 months, n= 5; 3 
months, n=7;7 months, n= 6; Ankrd16~/~: 
7 months, n =7; Aars’ and wild-type: 7 
months, n = 3. Scale bars, 501m (a, b, low 
magnification; c, higher magnification); 
10m (a, b, higher magnification); 500 .m 
(c, low magnification); 100 1m (d). CA, 
cornu ammonis; DAPI, 4’,6-diamidino-2- 
phenylindole; DG, dentate gyrus; P, postnatal 
day. 
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Extensive neuron loss in sticky mice lacking Ankrd16 

We performed immunofluorescence experiments to determine whether 
the specificity of neuron death in Aars*“" mice is correlated with the 
levels of ANKRD16. ANKRD16 was widely expressed in the brain and 
was detected in the nucleus as well as the cytoplasm (Fig. 5a, Extended 
Data Fig. 7a, b). Notably, ANKRD 16 levels were lower in Purkinje cells 
relative to cerebellar granule cells, cells of the cerebellar molecular layer, 
hippocampal neurons, and cells in the cortex (Fig. 5a). 

To test whether the levels of Ankrd16 influence the sensitivity of 
cells to the Aars“/s“ mutation, we generated an Ankrd16-null allele 
(Extended Data Fig. 7c, d). Ankrd16 ~~ mutant mice had no obvious 
pathologies in the brain or other organs, even when aged to 12 months 
(data not shown). However, loss of Ankrd16 in Aars*“ mice resulted 
in early embryonic lethality (Extended Data Fig. 7e). These results 
show that the low levels of ANKRD16 present in B6 mice are suffi- 
cient to resolve sti-mediated editing defects induced during embryonic 
development. 

To determine the effect of decreasing ANKRD16 levels on neuronal 
cell survival, we conditionally deleted Ankrd16 in postnatal Purkinje cells 
in Aars‘ mice. Purkinje cell loss in TgPcp2-Cre Ankrd16"”~ Aars*!s# 
mice began at about three weeks of age; by four weeks of age, the major- 
ity of Purkinje cells were absent and, in contrast to Aars*/st’ mice, all 
Purkinje cells had degenerated by seven months of age (Extended Data 
Figs. 8a, 9a). Formation of ubiquitin- and p62-positive aggregates was 
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also accelerated in Purkinje cells, with aggregates observed in 12.3% of 
Purkinje cells compared to 2.5% of these neurons in Aars“““ mice at 
three weeks of age (Extended Data Fig. 9a). 

Deletion of Ankrd16 in the embryonic cerebellar primordium 
using En1‘* resulted in protein aggregates and neuron death in 
interneurons in the molecular layer and neurons in the granule cell 
layer as well as in the Purkinje cells (Extended Data Figs. 8b, c, 9b). 
Furthermore, loss of Ankrd16 in postnatal cortical and hippocampal 
neurons also caused their degeneration. Ubiquitin- and p62-posi- 
tive aggregates were observed both in hippocampal pyramidal cells 
(2.80 + 0.22%) and in cortical neurons in two-month-old TgCaMKIIa- 
Cre Ankrd16!"~ Aars“!*" mice (Fig. 5b-d, Extended Data Fig. 9c). 
Together, these data demonstrate that neurons other than Purkinje cells 
are also sensitive to the effects of mistranslation, and further suggest 
that ANKRD 16 protects against mistranslation in a dose-dependent 
fashion. 


Discussion 

In contrast to known editing mechanisms of tRNA synthetases 
or free-standing homologues of editing domains that act autono- 
mously*?*, tRNA-independent hydrolysis of misactivated serine by 
AlaRS is enhanced through the binding of ANKRD16, which captures 
misactivated serine and removes it from the pool for protein synthesis 
(Extended Data Fig. 10a, b). Unlike most tRNA synthetases, AlaRS mis- 
activates amino acids that are both smaller (glycine) and larger (serine) 
than alanine, owing to structural properties that make it difficult for 
AlaRS to exclude misactivated serine”** 2’. Thus, serine misactivation 
by AlaRS, and the toxic effects of serine-for-alanine replacements in 
vertebrates, may present a special situation in which an editing co-fac- 
tor is necessary for proofreading. 

In addition to providing direct mechanistic insights into the editing 
functions of aminoacyl tRNA synthetases, the discovery of Ankrd16 
highlights the importance of studying mRNA translation in higher 
organisms, and may provide understanding for the cell-type sensitivity 
of phenotypes associated with the Aars mutation. More broadly, 
cell-type selectivity resulting from ‘monogenic’ mutations in ubiquitous 
genes has been difficult to resolve. Only a few modifier genes of disease 
mutations have been identified in an unbiased approach, and these 
suggest that modifier genes may function in independent, parallel 
pathways”**°, Our identification of Ankrd16 as a modifier of the Aars™ 
mutation demonstrates that this possibility may be an oversimplifi- 
cation, and that restricted pathologies may be due to the expression 
levels of genes that modify the function of the gene harbouring the 
primary mutation. 
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METHODS 


Ethical compliance. All mouse studies were performed under the guidance of 
the University of California, San Diego and The Jackson Laboratory Animal Care 
and Use Committee in accordance with institutional and regulatory guidelines. 
Mice. Genetic mapping of the Msti (modifier of sticky) was performed on N; and 
F, mice using polymorphic microsatellite and SNP markers (primer sequences 
are listed below). In brief, C57BL/6J (B6).Aars!+ mice were intercrossed with 
CAST/E; (CAST) mice and the resulting F, Aars*!+ mice were either back- 
crossed to B6.Aars!* to generate N; Aars*!t’ mice (n= 1,249) or intercrossed 
to generate F, Aars‘/s mice (n= 766) for phenotypic assessment. Ataxia and 
Purkinje cell degeneration of N; or F, Aars*!/sti mice were assessed at three to four 
months of age. To generate Ankrd16~/~ Aars*’/*" embryos, timed matings between 
Ankrd16~/~ Aars*'+ and Ankrd16~'~ Aars*'* mice were performed. The day that 
a vaginal plug was detected was defined as embryonic day 0.5 (E0.5). 

To generate heterozygous congenic B6.CAST-Msti mice (Msti©’S"/®5), B6 and 
CAST mice were intercrossed and mice carrying the CAST allele at the Msti flanking 
markers (D2Mit4 and D2Mit427) were backcrossed to B6 mice for seven genera- 
tions. All transgenic mice were generated by pronuclear injection of B6 zygotes. 
Tg25L9 transgenic mice were generated by injection of a CAST-derived BAC clone 
(CH26-25L9). To generate transgenic TgPcp2 Ankrd16 mice, the coding sequence 
of Ankrd16©4ST cDNA was inserted into the BamHI site of the fourth exon of the 
L7/Pcp2 minigene*". To generate Tg-CAG Ankrd16-Myc transgenic mice, the stop 
codon of the Ankrd16C’ST cDNA was replaced with three copies of a myc epitope 
tag and cloned into the EcoRI site of the pCAGGS vector’. 

Generation of the targeted Ankrd16 allele was performed by homologous recom- 
bination of the Ankrd16 locus in R1 embryonic stem cells. Targeted embryonic 
stem cells were confirmed by Southern blotting. Cells were injected into B6] blas- 
tocysts to generate Ankrd16"°”* mice, which were backcrossed ten generations to 
B6J mice. Ubiquitous deletion of exon 2 of Ankrd16 to generate the Ankrd16*/— 
allele was accomplished by crossing Ankrd16"°'+ mice to B6.FVB-Tg(Ella-cre) 
C5379Lmgd/J (The Jackson Laboratory, stock #003724). To generate a conditional 
Ankrd16""* allele, the neo cassette was removed by crossing Ankrd1 6"'+ mice 
to B6.129S4-Gt(ROSA)26Sor'™! LP) Pym /Rainj (The Jackson Laboratory, stock 
#009086). For conditional loss-of-function experiments, the following Cre-lines 
were used: B6.129-Tg(Pcp2-cre)2Mpin/J (The Jackson Laboratory, stock #004146), 
Eny™(cre) Wrst} (The Jackson Laboratory, stock #007916), B6.Cg-Tg(Camk2a-cre) 
T29-1Stl/J (The Jackson Laboratory, stock #005359). 

SNP markers are as follows: D2SlacCA3 forward: 5'GCTAGAAAGATGCTGG 
TAATGGAA3’, D2SlacCA3 reverse: 5'GGCTGGCTGTGTAAGCACAT3’; 
D2SlacCA6 forward: 5’AACACACCATACCACACACACA3’, D2SlacCA6 
reverse: 5’/TGGAGGTTTGCAAAGGAATC3’; D2SlacCA8 forward: 5/ATAC 
CCACCCACATGTGACG3’, D2SlacCA8 reverse: 5'GTACAAGATACTG 
AGAGTTGGT3’; D2SlacCA 10 forward: 5'AGGCTGGAGGACTGTGGGTT3’, 
D2SlacCA 10 reverse: 5‘GCGATTGTCCATGGTGGCTT3’; D2SlacCA11 forward: 
5'CATCTGGTTTATGTGAATGCC3’, D2SlacCA11 reverse: 5'CAGAGTGCTTTA 
CCTTTCTC3’; D2SlacCA12 forward: 5'AATGGTCAGCCCTGGAAAC3’, 
D2SlacCA12 reverse: 5’ACATTCAGCCTGCTTGGTCT3’; D2SlacTAGA1 
forward: 5/TGGATGAAGATTTTACAGAT3’, D2SlacTAGAI reverse: 
5'ATCTATC TGTCTGTCTATCC3’; Intr. 5 Ank16 forward: 5‘GTAGGGT 
TGGAGAGATAGCC3’, Intr. 5 Ank16 reverse: 5’/TGTGAACACAGTAATAC 
AGGTGC3’. 

Rotorod assay. Mice were placed on a rotarod linearly accelerating from 4 to 40 
r.p.m. (type 7650, Ugo Basile). The latency to fall was averaged from four trials and 
repeated over the course of four consecutive test days. A test was considered over 
when mice fell, 360s expired, or the mouse used gripping behaviour to remain 
on the rod. Rotarod experiments were performed with mice of both sexes and at 
least 15 mice per genotype (Msti®®®*Aars*!*+, n= 15; Msti®©®*Aars*!, n= 16; 
MstiCAS™®6 Aars*!st’, 7 = 15). Data represent mean + s.d. and were analysed by two- 
way ANOVA using Tukey’s multiple comparison tests. 

Histology and immunohistochemistry. Anesthetized mice were transcardially 
perfused with 4% paraformaldehyde (PFA, for immunofluorescence and histology) 
or acetic acid/methanol (for calbindin-D28 immunohistochemistry and immuno- 
fluorescence; for example, aggregate staining). Tissues were post-fixed overnight 
and embedded in paraffin. For histological analysis, sections were deparaffinized, 
rehydrated and stained with cresyl violet according to standard procedures. For 
immunofluorescence, antigen retrieval on paraformaldehyde-fixed sections was 
performed by microwaving sections in 0.01 M sodium citrate buffer (pH 6.0, 0.05% 
Tween-20) three times for 3 min each. Paraformaldehyde—or acetic acid/meth- 
anol-fixed sections were incubated with the following primary antibodies: rabbit 
anti-ubiquitin (DAKO, # Z0458, 1:1,000), mouse anti-ubiquitin (Cell Signaling, 
#3936, 1:350), guinea pig anti-p62 (ARP, 03-GP62-C, 1:300), rabbit anti-Calbindin 
D28 (Swant, CB38, 1:300), and guinea pig anti-ANKRD16 (1:200). Guinea pig or 
rabbit anti- ANKRD16 antibodies were custom-made against recombinant mouse 
ANKRD16 (1-154 aa) and affinity purified. Detection of primary antibodies was 
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performed with Alexa Fluor-488, -555 or -350 conjugated goat anti-mouse, -rabbit 
or -guinea pig secondary antibodies (Invitrogen). Sections were counterstained 
with DAPI, and autofluorescence was quenched with Sudan Black. 

For aggregate quantification, the total number of Purkinje cells or hippocam- 
pal pyramidal neurons and ubiquitin- or p62-positive neurons were quantified 
using three sections spaced 100j1m apart per mouse. The percentage of cells with 
aggregates + s.d. at different time points were compared between genotypes using 
multiple t-tests and corrected for multiple comparisons using the Holm-Sidak 
correction (Fig. 1c and Extended Data Fig. 9a). All histological analyses were 
performed with at least three mice of each genotype and time point, using mice 
of both sexes. 

Constructs. For the generation of the various Aars constructs, a full-length 
mouse Aars clone (Open Biosystems/GE Dharmacon) was used as a template. 
The wild-type mouse Aars sequence was subcloned and inserted into a mamma- 
lian expression vector (pCMV3, Agilent Technologies) via restriction digestion. 
To generate the mutant AlaRS(A734E) construct, the mutant editing domain of 
Aars was amplified by PCR from total brain cDNA of B6.Aars**#, TOPO sub- 
cloned (TOPO TA cloning kit, ThermoFisher, K450002), and the wild-type editing 
domain of pCMV3-Aars was replaced with the mutant editing domain (pCMV3- 
AlaRS(A734E)) via restriction digestion. In addition, wild-type and mutant Aars 
(pCMV3) were digested and inserted into pCruz HA (C) vector (HA, human influ- 
enza haemagglutinin; Santa Cruz, sc-5045). To generate AlaRS deletion constructs 
(pCMV3-AAD-Aars, pCMV3-AAAD-Aars, pCMV3-AAAD-AlaRS(A734E)), 
individual domains of AlaRS were amplified by PCR using either full-length wild- 
type or mutant Aars constructs, TOPO subcloned, and inserted into pCMV3 vec- 
tor via restriction digestion. The TgCAG Ankrd16-Myc construct was used to 
amplify Ankrd16 by PCR, whereas Ankrd29 was amplified from total brain B6J 
cDNA. PCR products were TOPO subcloned and then inserted into pCMV3 via 
restriction digestion. QuickChange II Site-Directed Mutagenesis kit (Agilent, 
#20052) was used to mutate lysines 102, 135 and 165 in ANKRD16 to arginines 
(ANKRD16(3 xR)). 

Cell culture. Mouse embryonic fibroblasts (MEFs) were prepared by standard 
procedures*”. HEK293T (American Type Cell Culture, not authenticated nor tested 
for mycoplasma contamination) and MEF cells were maintained in Dulbecco's 
modified Eagle's medium (Sigma) with GlutaMAX (Gibco) and 10% fetal bovine 
serum at 37°C in 5% CO >. HEK293T cells were transfected with various constructs 
as listed. Transfections were performed using Lipofectamine 2000 (Invitrogen) 
according to the manufacturer’s protocol and cultured for 48h before co-IP and 
western blot analysis. 

For the serine toxicity assay, freshly isolated MEFs were cultured in increasing 
concentrations of serine for 24h; MEFs were collected and stained with propidium 
iodide. The percentage of cell death was measured by fluorescence-activated cell 
sorting (n = 3, Fig. 3d). Data represent mean + s.d. and were analysed by two-way 
ANOVA using Tukey’s multiple comparison tests. For fluorescence microscopy 
(n=4, Fig. 4g), Aarsstilsti embryonic fibroblasts were co-transfected with phr-GFP 
Il and either ANKRD16 Flag or ANKRD16(3 x R)-Flag using the P4 Primary Cell 
4D-Nucleofector X Kit (Lonza), with around 50% transfection and around 95% 
co-transfection efficiency. 12h after transfection and seeding, cells were washed 
twice with PBS and media with either 0.4mM or 40 mM serine was added for 24h. 
Subsequently, cells were stained with propidium iodide (Invitrogen, P21493, 250 ng 
ml~') and Hoechst 33342 (Life Technologies, H3570, 5g ml!) to determine cell 
death. The number of cells that were propidium iodide- and GFP-positive out of 
the total GFP-positive cells was determined. About 1,000 cells were counted per 
genotype and condition. Cell death was expressed as the percentage increase in 
cell death between low and high serine conditions. Data represent mean +s.e.m. 
and were analysed by one-way ANOVA using Tukey’s multiple comparison tests. 
Cell fractionation. Brains from six-week-old mice were isolated and immediately 
processed for subcellular fractionation as previously described**™. In brief, brains 
were excised, immersed in ice-cold PBS, and washed three times. After washing, 
brain tissue was minced into smaller pieces and homogenized in buffer (0.25 M 
sucrose, 10mM HEPES, pH 7.5, protease inhibitor cocktail (Roche)) using a Potter- 
Elvehjem Tissue Grinder. The homogenate was centrifuged at 1,000g for 10 min to 
generate a crude cytoplasmic fraction and nuclear fraction for further purification 
by ultracentrifugation. The supernatant (crude cytoplasmic fraction) from four 
brains was pooled to purify the mitochondrial, rough endoplasmic reticulum and 
cytoplasmic fractions, and the pellet (crude nuclear fraction) from two brains 
was used to purify the nuclear fraction. Eventually, purified pellets (mitochon- 
dria, nucleus, rough endoplasmic reticulum) were lysed (4°C, 30 min) in lysis 
buffer (50 mM Tris-HCl, 150mM NaCl, 1mM EDTA, 1% Triton X-100, 0.1% SDS) 
and proteins from the cytoplasmic fraction were concentrated by ultrafiltration 
(Millipore, Amicon Ultra-15, #UFC900308). Equal amounts of total protein from 
each cellular fraction were subject to electrophoresis on SDS-PAGE gels. 
Reverse transcription, quantitative PCR, and genomic PCR analysis. Cerebella 
from various inbred strains were isolated and immediately frozen in liquid 
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nitrogen. Total RNA was extracted with TRIzol reagent (Life Technologies). CDNA 
synthesis was performed on DNase-treated (DNA-free DNA Removal Kit, Life 
Technologies AM1906) total RNA using oligo(dT) primers and the SuperScript 
III First-Strand Synthesis System (Invitrogen/Life Technologies). Quantitative 
RT-PCRs (qRT) were performed using iQ SYBR Green Supermix (Bio-Rad) and 
an iQ5 Multicolor Real-Time PCR Detection System (Bio-Rad). Reactions were 
performed with primers either complementary to the CAST or to the B6 Ankrd16 
sequence. Both primer sets showed similar results; results using B6 primers are 
shown in Fig. 2d. Expression of Ankrd16 transcript was normalized to 185 rRNA 
using the 2~44CT method**, expressed as fold change + standard error of the mean 
(s.e.m.) relative to control (C57BL/6)) and analysed by multiple t-tests using Holm 
Sidak comparison tests (n= 3). Ankrd16 quantitative and conventional RT-PCR 
analysis was performed on 50 ng of cerebellar cDNA generated from one- or two- 
month-old mice, respectively. 

To confirm non-synonymous SNPs of candidate genes, PCRs were performed 
on genomic DNA from various inbred strains. PCR products were purified 
(Agencourt AMPure XP—PCR Purification, Beckman Coulter) and subjected 
to sequencing. 

The RT-PCR primers used are as follows: 5/UTR Ankrd16 forward: 5‘GTCTT 
CCTCCTACTTTTGTCCA3’, Exon2 Ankrd16 reverse: 5'CAAG G TTCTTCCTT 
GTGCAC3’; Exon1 Ankrd16 forward: 5’TCGTGGACTC CTTGAAGAAG3’, 
Exon3 Ankrd16 reverse: 5/TTCCAAACAGCCGTGCATTG3’, Exon2 Ankrd16 
forward: 5/TGATCCTCCGGTACTTGCT3’, Exon5 Ankrd16 reverse: 5’ATCTAC 
ATCGATGCCAAGACC3’, Exon4 Ankrd16 forward: 5'GCTGCTCCTT 
GAACAGCATAA3’, Exon6 Ankrd16 reverse: 5'CACCCAAGGACAACAG 
AGTT3’, Exon5 Ankrd16 forward: 5‘CAGCACTTCACTATGCAGCA3’, Exon7 
Ankrd16 reverse: 5‘'CATG GTCAAAGTCCTGAAGGA3’, Exon6 Ankrd16 forward: 
5'GTGCCGACATCAACTCTACA3’, 3’UTR Ankrd16 reverse: 5/TGAACGGCCTT 
GAACTCCAT3’; Exon3 Ankrd16 forward: 5'GCTGTTTGGAAGCAGTCCA3’, 
Exon5! Ankrd16 reverse: 5'CCAAGGAAGAGCTTACGCTAT3’. 

qRT-PCR primers: Exon 5/6 (B6) forward: 5/TGCAGCAAAGGAAGGAC 
AGA3’, or Exon 5/6 (CAST) forward: 5’/TGCAGCGAAGGAAGGACAGA3’, 
Exon7 Ankrd16 reverse: 5'GTAGGAGGAGCCTGGTGCAA3’; 18S RNA 
forward, 5’ GAGGGAGCCTGAGAAACG G 3’, 18S RNA reverse: 5/GTCGGGAG 
TGGGTAATTTGC3’. 

Non-synonymous SNP primers (Msti candidate genes): [/2ra(T101S/Q106H) 
forward: 5'TCCCATATTCTCCAGCCCATS3’, I/2ra(T101S/Q106H) reverse: 
5’CACCTGCCTGTTGAGAACA3’; [2ra(S143R) forward: 5/TGCTGT 
GTCTTCCCAAACTT3’, I/12ra(S143R) reverse: 5‘CCCGTTTTC 
CCACACTTCAT3‘; I]2ra(G158S) forward: 5/CCACCTCCTTGGAAACATGA3’, 
Il2ra(G158S) reverse: 5/TCGGTGGTGTTCTCTTTCATC3’; []2ra(M225T) 
forward: 5'GCAGTCCTATGCTAGCCATAA3’, I]2ra(M225T) reverse: 
5'CGTCTTTGCATGCTTCACC3’; I]2ra(K236Q/M233V) forward: 5’/TTAG 
TGGCCGTTTCAACCAG3’, I/2ra(K236Q/M233V) reverse: 5‘CTTGTGCAG 
TCCTATGCTAGC3’; Fbxo18(H15Q) forward: 5’CCCTATTCCGTCCT 
TTTGTTC3’, Fbxo18(H15Q) reverse: 5’ATGAGACGGTTTAAGCGGA 
A3/; Fbox18(P369/376A) forward: 5'ATTAACGTCTGGGCCTTGGT3’, 
Fbx018(P369/376A) reverse: 5/TGCTGATGTTAATCCCCTTCTC3’. Ank16 
(E335A/T346M) forward: 5’ATCTGGCCTGCGCAGGTCA3’, Ank16(E335A/ 
T346M) reverse: 5'CTTGAACTCCATCGCCTC3’. 

Co-immunoprecipitation experiments. Brain or liver tissue from four-week-old 
Tg-CAG Ankrd16-Myc or non-transgenic mice were isolated and immediately 
used or frozen in liquid nitrogen. Whole protein lysate from either tissue sam- 
ples or HEK293T cells was extracted with solubilization buffer (20 mM Tris-HCl 
(pH 7.4), 137mM NaCl, 0.1% Triton X-100, Protease Inhibitor Cocktail (Roche)) 
using a Potter-Elvehjem Tissue Grinder (for tissue samples). Protein extraction 
was performed at 4°C for either 45 min (tissue samples) or 30 min (cell culture 
samples) under gentle agitation. The lysate was centrifuged at 16,000g at 4°C for 
either 30 min (tissue samples) or 20 min (cell culture samples). The lysate was pre- 
cleared in two steps at 4°C under gentle agitation, first with Dynabeads Protein G 
(Dynabeads Protein G Immunoprecipitation Kit, Invitrogen, 10007D) for 90 min 
followed by clearing with normal IgG complex for 90 min (corresponding to the 
species of IP antibody, Santa Cruz, mouse IgG (sc-2025), or goat IgG (sc-2028)) 
that was pre-immobilized on Dynabeads Protein G. Subsequently, pre-cleared 
lysates were incubated with pre-immobilized Dynabeads Protein G with antibody 
(mouse anti-Myc, Abcam, ab18185; mouse anti-Flag, Sigma, clone M2, F1804; or 
goat anti-HA, Santa Cruz, HA probe Y-11, sc-805-G) for 3 to 5h at 4°C under gen- 
tle agitation. After washing with wash buffer (20 mM Tris-HCl (pH 7.4), 137 mM 
NaCl, 0.5% Triton X-100 (tissue samples) or 20 mM Tris-HCl (pH 7.4), 300 mM 
NaCl, 0.1% Triton X-100 (cell culture samples)), proteins were eluted by incubation 
with SDS-PAGE sample buffer for western blotting. 

Mass spectrometry. Tissue lysates were prepared as described for co-immunopre- 
cipitation and proteins were eluted twice by incubation with 2% SDS for 5 min at 
90°C. After elution, proteins were recovered in 100 mM Tris buffer (pH 7.4), and 


purified using chloroform/methanol precipitation. Protein pellets were dried using 
a speed vac and dissolved in 30,11 of 20mM Tris buffer (137 mM NaCl, pH 7.4). 
The two elutions for each sample were combined (total volume 6011), then mixed 
with 33.5 of 50 mM NH4HCO; and 1 il of 0.5 M tris(2-carboxyethyl)phosphine 
hydrochloride (Thermo Scientific, #20490). Samples were incubated for 20 min, 
occasionally vortexed, mixed with 2.7 11 of 0.55 M iodoacetamide, incubated for 
15 min in the dark, and mixed with 1 1] of 1% ProteaseMax (Promega, V2071) 
to enhance the subsequent digestion. Trypsin digests were performed at 37°C 
overnight following the instructions for ‘solution digestion’ using Trypsin Singles 
(2g of trypsin per sample, Proteomics Grade, Sigma, T7575), mixed with 5% 
formic acid (final concentration 0.5%), centrifuged for 30 min at 13,000g, and the 
supernatant was collected for analysis by mass spectrometry. 

For mass spectrometric analysis, a four-step MudPIT analysis was performed 
using an Accela pump and a Thermo LTQ-XL (ThermoFisher Scientific) using an 
electrospray stage built in-house*®. Tandem mass spectra were extracted from raw 
files using RawExtract 1.9.9°” and were searched against a Uniprot Mus musculus 
database with reversed sequences using ProLuCID**. The search space included 
all fully-tryptic and half-tryptic peptide candidates with a fixed modification of 
57.02146 C. Proteins were considered putative interaction candidates if they were 
detected with at least five peptides (total number of peptides). Proteins detected in 
both transgenic (experimental sample) and non-transgenic (negative sample) tis- 
sue were considered non-specific and excluded as putative interaction candidates. 
Western blotting. Proteins were extracted with RIPA buffer (including Proteinase 
Inhibitor cocktail, Roche) or solubilization buffer (see above) when used for co-IP 
experiments, and were resolved on SDS-PAGE gels before transfer to PVDF mem- 
branes (GE Healthcare Life Sciences, #10600023) using a tank blotting appara- 
tus (BioRad). After blocking in 5% non-fat dry milk (Cell Signaling, #9999S), 
blots were probed with primary antibodies at 4°C overnight: mouse anti-Myc 
(Abcam, ab18185, 1:1,000), goat anti-Myc (Abcam, ab9132, 1:1,000), rabbit anti- 
Myc (Abcam, ab9106, 1:2,000), guinea pig antic ANKRD16 (1:150), rabbit anti- 
ANKRD16 (1:400), mouse anti-AlaRS (A-6, Santa Cruz, sc-165990, 1:100), rabbit 
anti-AlaRS (Sigma, HPA040870, 1:500), rabbit anti-GAPDH (Cell Signaling, 
#2118, 1:6,500), mouse anti-Flag (Sigma, M2 clone, #F 1804, 1:5,000), rabbit anti- 
Flag (Sigma, F-7425, 1:5,000), mouse anti-HA (Covance, MMS-101P, 1:2,000), 
rabbit anti-Histone 3-HRP (2h incubation at room temperature, Cell Signaling, 
#5192, 1:500), rabbit anti-COX IV (Cell Signaling, #4850, 1:5,000), rabbit anti- 
Grp78 (StressGen, SPA-826, 1:500), goat anti-Sec613 (Santa Cruz, sc-27694, 1:500) 
and followed by incubation with HRP-conjugated secondary antibodies for 2h at 
room temperature: goat anti-rabbit IgG (BioRad, #170-6515), goat anti-mouse 
IgG (BioRad, #170-6516), bovine anti-goat IgG (Santa Cruz, sc-2384), donkey 
anti-goat IgG (Santa Cruz, sc-2056), goat anti-guinea pig IgG (Santa Cruz, sc-2903) 
or donkey anti-guinea pig IgG (Millipore, AP193P). Signals were detected with 
SuperSignal West Pico Chemiluminescent substrate (Thermo Scientific, #34080). 
Quantification of protein levels (cerebellum, C57BL/6J, CAST/E;, Aars*!s", 
Aarstilsti stil AST/CAST and MstiCAS™/®6, three- to four-week-old mice, n =3) and 
tissue expression (C57BL/6J, Msti©4S"/®6 and MstiCS™/CAST six-week-old mice) 
of ANKRD 16 was determined, normalized to GAPDH, and expressed as a fold 
change + s.d. relative to Aars*/*# and analysed by ordinary one-way ANOVA 
(Tukey-test). Quantification of binding affinity was determined by normalizing 
the immunoprecipitation signal to the input signal of AlaRS. 

Protein expression and purification. Histidine-tagged human or mouse AlaRS 
(full length and truncations (AlaRS- AAAD has an N-terminal SUMO tag)), E. coli 
AlaRS, human TyrRS, human TrpRS, mouse ANKRD16 and mouse ANKRD29 
were expressed in E. coli and purified by Ni-NTA resin (Qiagen), followed by 
HiLoad 16/60 Superdex 200 prep grade column (GE Healthcare). Glutathione 
S-transferase (GST)-tagged mouse ANKRD 16 was expressed in E. coli and purified 
by glutathione-Sepharose 4B beads (GE Healthcare), followed by a HiLoad 16/60 
Superdex 200 prep grade column. The final purified proteins were stored in the 
20mM HEPES (pH 7.5), 200 mM NaCl, and 1 mM dithiothreitol (DTT). 

In vitro transcription of t(RNA“", The DNA template containing a T7 promoter 
and a tRNA“" gene was synthesized by PCR of overlapping oligonucleotides. The 
transcription reaction was performed in 40 mM Tris-HCl (pH 8.0), 25 mM NaCl, 
20mM MgCh, 2\1g ml pyrophosphatase, 0.1 mg ml~’ bovine serum albumin 
(BSA), 5mM DTT, 20mM NTPs with T7 polymerase and the DNA template at 
37°C for 2h. The tRNA transcript was purified by phenol-chloroform extraction. 
The purified tRNA was annealed by heating to 95°C for 3 min and then slowly 
cooled to room temperature, with addition of 1 mM MgCl, at 55°C. 

Dipeptide translation experiments. Ribosomal subunits (from E. coli MRE600) 
and EF-Tu were purified as described*?°. The model mRNA used is identical 
to m291"!, except with GCU as codon 2, and was made by in vitro transcription 
followed by PAGE purification. Native E. coli tRNA™* was purchased (Chemical 
Block Ltd) and charged and formylated as described”. A transcript corresponding 
to human tRNA“" (AGC) was generated in vitro, purified by PAGE and charged 
with alanine and serine’? using AlaRS and AlaRS(1-455), respectively. 
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70S initiation complex was formed by incubating heat-activated 30S subunits 
(0.2{1M), mRNA (11M), formyl-[?°S]-Met-tRNA™* (0,2 1M), 50S subunits 
(0.6.M) and 1 mM GTP in buffer A (50 mM K-HEPES pH 7.6, 10mM KCl, 
100mM NH,Cl, 10mM MgCh, 1mM DTT) at 37°C for 30 min. Charged tRNA 
was used to form the ternary complex by incubating either Ser-tRNA“! or 
Ala-tRNA“* (141M) in buffer A with EF-Tu (4.M), GTP (1mM), phosphoenol 
pyruvate (2mM), and pyruvate kinase (Sigma; 50 jg ml!) at 37°C for 15 min. The 
ternary complex was used in reactions at 1 }.M. ANKRD16 was purified as above 
and used in reactions at 441M. 

For experiments that begin with deacylated tRNA, tRNA“ (51M) was 
pre-equilibrated at 37°C with EF-Tu (81M), amino acid (for example, 100 1.M 
alanine (Fig. 4a), 500|.M serine and 11M alanine (Fig. 4b) or 500|1M serine 
(Extended Data Fig. 3c)), GIP (1mM), ATP (10mM), phosphoenol pyruvate 
(2mM), pyruvate kinase (50,.g ml~!), pyrophosphatase (Roche; 8ngl~!), and 
BSA (4M) in buffer A, and AlaRS (411M) was added to initiate the reaction. 
Peptide products formed upon reaction of 70S initiation complex and the ternary 
complex were resolved by electrophoretic thin layer chromatography (eTLC)* and 
quantified using a Typhoon FLA9000 phosphoimager (GE Healthcare). Apparent 
rates were obtained by fitting the kinetic data to a single exponential equation. 

E. coli strain construction and halo assay. Editing-defective AlaRS(C666A/ 
Q584H) E. coli was prepared and the halo assay performed as described™.. In brief, 
E. coli were transformed with empty pBAD33/21 vector or pBAD33/21 containing 
mouse wild-type ANKRD16, ANKRD16(3 xR) or mouse ANKRD29. The resulting 
strains were grown overnight in M9 minimal media with 0.4% glycerol, 0.002% 
L-arabinose, 0.01 mg ml”! thiamine and the following antibiotics: kanamycin 
(50,1g ml~!), ampicillin (100 1g ml~!), and chloramphenicol (30}.g ml~!). Five 
microlitres of an overnight culture were diluted (1:100) and spread on M9-Kan/ 
Amp/Cm plates. One hundred microlitres of 1 M L-serine were added to a well cut 
into the centre of the plate and allowed to diffuse into the agar, creating a radial 
gradient. Plates were incubated at 37°C and imaged after 48 h. 

SwitchSENSE measurements. Wild-type AlaRS, AlaRS(A734E) and AlaRS(1-455) 
were coupled to the 5’ end of a 48-mer ssDNA complementary nanolever sequence 
(cNL-B48) using the Amine Coupling Kit 1 (Dynamic Biosensors), and the prod- 
ucts were purified using an AKTA start system (GE Life Sciences) equipped with 
an Anion Exchange Column (Dynamic Biosensors). The cNL-B48 conjugates were 
then hybridized to a 3’ red-fluorescent labelled 48mer ssDNA nanolever tethered to 
a biochip electrode. The analyte, wild-type or mutant ANKRD 16, was injected onto 
the electrode in three different concentrations (0.63, 1.25, and 2.50 or 5.00,1.M) 
at a constant flow rate of 50,11 min! for 3.5 min and dissociation was observed 
at a buffer flow rate of 200 ul min~! for 6 min. Measurements were performed 
on a DRX2 instrument (Dynamic Biosensors) and switchSENSE technology was 
employed as previously described“*, Molecular interactions were recorded at con- 
stant negative potential (—0.1 V) on the basis of the changes in the red fluorescence 
signal upon analyte binding. Kinetic curves were referenced to the corresponding 
buffer control and kinetic values were determined by global monophasic fitting 
using switchANALYSIS software (Dynamic Biosensors). 

Aminoacylation and misacylation assays. Aminoacylation assays were performed 
as described’. In brief, assays were carried out at room temperature using either 
mouse AlaRS(A734E) alone (50 nM, Extended Data Fig. 3g or 2\1M, Extended 
Data Fig. 4e) or a mixture of AlaRS(A734E) and 104M ANKRD16 with 50mM 
HEPES (pH 7.5), 20mM KCl, 4mM ATP, 5mM MgCl, 2mM DTT, 4p.g ml! of 
inorganic pyrophosphatase (Roche), 20}.M [°H]-t-alanine, 2|1M BSA and 101M 
in vitro transcript corresponding to human tRNA4A". 

Misacylation assays were performed at 25°C in reactions with 51M 
AlaRS(A734E) alone or a mixture of 541M AlaRS(A734E) and 104M ANKRD16 
with 50 mM HEPES (pH 7.5), 50mM KCl, 4mM ATP, 10mM MgCl, 5mM DTT, 
4g ml“! of inorganic pyrophosphatase (Roche), 50}.M [°H]-.-serine, 2}.M BSA 
and 10M in vitro transcript corresponding to human tRNAA". 

Serine- or alanine-linked ANKRD 16 (aa-linked ANKRD 16) and aminoacylated- 
tRNA*“* (aa-linked tRNA“") were determined based on aminoacylation reactions 
with [*H]alanine or (*H]serine using 51M AlaRS(A734E) in the presence of 101M 
tRNA“ and 10,.M ANKRD16. The reaction was incubated for 1h, then half of the 
reaction was treated with alkaline pH (Na2COs) for 20 min to hydrolyse acylated 
tRNAA!# (leaving aa-linked ANKRD 16), followed by trichloroacetic acid (TCA) 
precipitation. Aminoacylated tRNA“" was the difference between total apparent 
aminoacylated tRNA (no alkaline hydrolysis) and aa-linked ANKRD 16 (after alka- 
line hydrolysis). Data represent mean + s.d. and were analysed by a two-tailed 
Student's t-test. 

ATP hydrolysis assays. ATPase assays were performed as previously described. 
Assays measuring the hydrolysis of misactivated serine (Ser-AMP) were carried 
out at room temperature in a reaction with either 500nM mouse AlaRS alone, or 
a mixture of 500nM AlaRS and 10,.M wild-type ANKRD 16, mutant ANKRD16, 
or ANKRD29 proteins. Reactions also contained 100 mM HEPES (pH 7.5), 40mM 
KCl, 10mM MgCl, 0.5mM ATP, 0.5.M [a-?P]ATP, 10 units per ml inorganic 
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pyrophosphatase, 2mM DTT and 200 mM serine. Two microlitre aliquots of 
the editing reaction were taken at appropriate time points and quenched in 8 1l 
200 mM sodium acetate (pH 5.0), and then 1 1] was spotted on a polyethyleneimine 
thin-layer chromatography cellulose plate (EMD Millipore) and run in a solution 
containing 0.1 M ammonium acetate and 5% glacial acetic acid to separate [a-**P] 
ATP, [>*P]AMP and Ser[**P] AMP. Results were imaged using a phosphoimager. 
The percentage of hydrolysed AMP (ATPase activity) was quantified with the 
Molecular Dynamics Image Quant software. Data represent mean + s.d. and were 
analysed by one-way ANOVA using Tukey’s multiple comparison tests. 

Far-UV circular dichroism. Circular dichroism spectra were obtained with a CD 
Spectrometer 400 (Aviv Biomedical, Inc.). Protein samples were prepared at 10 1M 
in PBS and measurements were scanned from 190 to 260 nm in increments of 
0.5nm at 20°C. Three independent scans were acquired and averaged for each 
protein sample. 

Thermal shift assays. Thermal shift assays were performed using the Protein 
Thermal Shift Dye Kit (ThermoFisher Scientific) according to the manufacturer's 
instructions and run on a StepOnePlus 96 Real Time Cycler (Applied Biosystems). 
In brief, a reaction of 2011 containing 5.0 11 thermal shift buffer, 12.5 1l of 101M 
wild-type ANKRD16 or ANKRD16(3 xR) and 2.5, of 8X thermal shift dye were 
mixed in a 96-well Optical Reaction Plate (Applied Biosystems). The plate was 
sealed and heated in the StepOnePlus Real-Time PCR from 25 to 90°C in incre- 
ments of 0.15°C. The measurements were performed in four replicates and analysed 
using the Protein Thermal Shift software (ThermoFisher Scientific). 
Deacylation assay. The aminoacylation assays were performed at room tempera- 
ture with 50 mM HEPES (pH 7.5), 20mM KCl, 5mM MgCh,4mM ATP, 2mM DTT, 
4g ml! pyrophosphatase, 20 {1M cold L-serine, 2.68 .M [H]serine (1 mCi ml!) 
as the assay solution. 201M tRNA4" transcript was mixed with the assay solu- 
tion and the reaction was initiated by adding 541M AlaRS protein into the 
mixture. After 30 min, a phenol-chloroform (pH 5.0) extraction was performed, 
and the product [*H]serine -tRNA4" was used as the substrate of the deacyla- 
tion assay. [7H]serine -tRNA4" was incubated at room temperature with 100nM 
AlaRS (wild-type AlaRS and AlaRS(A734E)) and ANKRD 16 protein in assay buffer 
(50mM HEPES (pH 7.5), 20mM KCl, 10mM MgCh, 5mM DTT). At varying time 
intervals, 5 ul aliquots were applied to a MultiScreen 96-well filter plate (0.45 1m 
pore size hydrophobic, low protein-binding membrane, Merck Millipore Ltd), 
which was pre-wetted with a quenching solution containing 0.5 mg ml~'! DNA and 
100mM EDTA in 300mM NaOAc (pH 3.0). After all time points were collected, 
100 l of 20% TCA was added to precipitate the nucleic acids. The plate was then 
washed four times with 20011 of 5% TCA containing 100 mM cold serine, then 
once with 95% ethanol. On drying after completion of the washing steps, 7011 of 
100 mM NaOH was added to elute the tRNA, which was then centrifuged into a 
96-well flexible PET microplate (PerkinElmer) with 150,11 of Supermix scintillation 
cocktail (PerkinElmer). After mixing, the radioactivity in each well of the plate was 
counted using the 1450 LSC & Luminescence Counter (PerkinElmer). 
ATP-pyrophosphate exchange assays. The ATP-pyrophosphate exchange assays 
were performed as previously described“. In brief, a reaction mixture of either 
250nM mouse AlaRS(A734E) alone or a mixture of 250nM AlaRS(A734E) and 
104M ANKRD16 was incubated at room temperature with 100 mM Tris-HCl 
(pH 7.5), 10mM KE, 10mM MgCl, 1mM ATP, 2mM DTT, 0.1 mg ml! BSA, 
20 mM L-serine, and 0.5mM sodium [*?P]pyrophosphate. 

Filter binding of EF-Tu with aminoacyl-tRNA“". The nitrocellulose filter-binding 
assays were performed as described*’, with some modifications. Aminoacyl- 
tRNA“" was generated in 200] aminoacylation reactions containing either 
500 nM AlaRS(A734E) alone or a mixture of 500nM AlaRS(A734E) and 101.M 
ANKRDI16, in reaction buffer with 50 mM HEPES (pH 7.5), 50mM KCl, 4mM 
ATP, 10mM MgCh, 5mM DTT, 41g ml of inorganic pyrophosphatase (Roche), 
501M t-alanine or L-serine, 211M BSA, and 10).M in vitro transcript corresponding 
to human tRNA“", Reactions were incubated for 30 min at room temperature. 
EF-Tu was loaded with GTP in a 200] reaction of 2)1M EF-Tu, 311M [7H]GTP, 
50mM Tris-HCl (pH 7.4), 50 mM NH4Cl, 10 mM MgCh, 3.75 mM phosphoe- 
nolpyruvate, 0.05 1g jl! pyruvate kinase, which was incubated for 2 min at 37°C 
and then placed on ice. An equal volume (200 11) of the aminoacylation reaction 
mixture or non-acylated tRNA" was added to the [7H]GTP-EF-Tu reaction 
mixture, incubated for 2 min on ice, and terminated by the addition of 2 ml of a 
cold wash buffer containing 10 mM Tris-HCl (pH 7.4), 50mM NH,Cland 10mM 
MgCl). The reaction mixture was filtered through a nitrocellulose filter, and the 
filter was washed with 15 ml of cold wash buffer. The filter was dissolved in a 
scintillation fluid and counted for radioactivity. The percentage of EF-Tu retained 
in the membrane was normalized to that retained in the non-acylated tRNA“* 
reaction. 

GST pull-down assay. GST pull-down experiments were performed in buffer 
containing 20 mM of HEPES (pH 7.5), 200mM NaCl, 0.5mM EDTA, 1mM DTT, 
10% glycerol, and 0.1% Triton X-100. Equal amounts of GST-ANKRD 16 proteins 
were incubated with histidine-tagged mouse or human AlaRS (full length and 
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truncations), E. coli AlaRS, or human TyrRS or TrpRS proteins for 2h in 4°C 
and pulled down by Glutathione-Sepharose 4B beads (GE Healthcare). After 
SDS-PAGE, the proteins were transferred to PVDF membranes using the iBlot 
Dry Blotting System (Invitrogen). The membranes were blocked for 1h with 5% 
non-fat dry milk in Tris-buffered saline with Tween-20. Anti-GST and anti-His-tag 
antibodies were purchased from Cell Signaling. After incubation with primary 
antibodies, the membranes were washed and incubated with HRP-conjugated 
anti-mouse or anti-rabbit secondary antibodies (Cell Signaling), followed by 
detection using enhanced chemiluminescence western-blotting substrate (Thermo 
Scientific). 

Molecular modelling. The mouse ANKRD 16 and human AlaRS complex model 
was obtained through the Patchdock server and was optimized by manual adjust- 
ments. The mouse ANKRD16 and human AlaRS models were generated by the 
Swiss model server. The human full-length AlaRS model was built on the basis of 
the templates of the human AlaRS structure (PDB 5KNN“), Archaeoglobus fulgidus 
AlaRS and tRNA“" complex (PDB 3WQY“’). AlaRS was assigned as the receptor 
and ANKRD16 was assigned as the ligand. Molecular visualization and analysis 
were performed with PyMOL. 

Mass spectrometry for serinylation of ANKRD16. To identify amino acid 
residues that were serinylated, misacylation assays were performed at 25°C 
for 1h ina reaction mixture of 541M AlaRS(A734E) and 10}.M ANKRD16 in 
50mM HEPES (pH 7.5), 50mM KCl, 10mM MgCh, 5mM tris(2-carboxyethyl) 
phosphine, 41g ml~! inorganic pyrophosphatase (Roche), 10mM t-serine, 21M 
BSA and 10,.M human tRNA4" in the presence or in the absence of 4mM ATP. 
Proteins were reduced with 5 mM tris-(2-carboxyethyl)-phosphine hydrochloride 
(Sigma, C4706) and alkylated with 55 mM 2-chloroacetamide (Fluka Analytical, 
#22790). Proteins were digested for 18h at 37°C in 2 M urea, 100 mM Tris (pH 8.5), 
and 1mM CaCl, with 2\1g trypsin (Promega, V5111). Five-step MudPIT analysis 
was performed using an Easy-nLC (ThermoFisher Scientific) and a Q Exactive 
(ThermoFisher Scientific) using an electrospray stage built in-house**. Tandem 
mass spectra were extracted from raw files using RawConverter™ and were 
searched against a Uniprot Mus musculus database with reversed sequences using 
ProLuCID***!, The search space included all fully tryptic peptide candidates with a 
fixed modification of 57.02146 C and differential modification of serine (87.03203) 
or alanine (71.03711) on lysine. 

Statistics and reproducibility. No statistical methods were used to predetermine 
sample size. The experiments were not randomized and investigators were not 
blinded to genotype during experiments. P values were computed in GraphPad 
Prism using Student's t-test (two-tailed), one-way or two-way ANOVA and cor- 
rected for multiple comparisons when applicable as indicated in the figure legends. 
Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. Source Data for quantifications mentioned either in the text or 
shown in graphs plotted in Figs. 1-4 and Extended Data Figs. 1, 3, 4, 5, 7, and 9 are 
available in the online version of this paper. 
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Extended Data Fig. 1 | See next page for caption. 
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Extended Data Fig. 1 | Identification of Ankrd16 as a genetic modifier 
of the B6J.Aars!* mutation. a, Three- to four-month-old Aars“' mice 
from crosses to inbred strains, and the numbers affected with ataxia and 
Purkinje cell degeneration. b, Location of Msti relative to microsatellite 
markers. cM, centimorgan. c, Non-synonymous SNPs in Msti candidate 
genes. The amino acids and positions are shown at the top of the table with 
the B6 residue listed first. d, RT-PCR analysis of Ankrd16 transcripts from 
cerebellar cDNA prepared from C57BL/6] and CAST/Ei mice. Note, the 
alternative Ankrd16 transcript containing exon 5’ (see Fig. 2c) amplified by 
primers to exon 4/6 or exon 5/7 was present in cDNA from C57BL/6J but 
not CAST/Ei mice, whereas the alternative transcript detected by exon1/3 
primers was present in cDNA from both strains. n = 2 biological replicates. 
e, RT-PCR analysis of Ankrd16 transcripts from cerebellar cDNA prepared 
from C57BL/6], CAST/Ei, Aars™!*" and MstifASl/CAST Aars/# mice. 

Note, the alternative Ankrd16 transcript containing exon 5’ (see Fig. 2c) 
amplified by exon 5/7 or 3/5’ primers was present in cDNA from C57BL/6] 
and Aars“", but absent in the presence of CAST/Ei-derived Msti (CAST/ 
Ei and MstiCAST/AST A arss!s# mice). n =3 biological replicates. f, Sequence 
of the SNP-containing region in intron 5 of Ankrd16. Upper-case letters 


indicate novel exon 5’ and lower-case letters indicate intron 5. The SNP 

in non-rescuing strains is shown in red. g, Western blotting analysis of 
ANKRD16 from cerebellar lysates. Note that the expression of ANKRD16 
is reduced in C57BL/6J and Aars' mice relative to mice with CAST/ 
Ei-derived Msti (CAST/Ei and MstiCAS!/CAST Aarss!s’ mice) (mean+s.d., 
n=3, one-way ANOVA (Tukey correction), ****P < 0.0001). h, Protein 
levels of AlaRS and ANKRD16 were determined by western blotting using 
various mouse tissues from C57BL/6] and B6 congenic mice heterozygous 
for the Msti region (Msti©4S™/®°). GAPDH is included as a loading control. 
Note the increase in ANKRD16 levels in Msti©45"/"® tissues, whereas 
AlaRS levels do not change between genotypes. n = 2 biological replicates 
for all tissues, n = 4 for neuronal tissues. i, PCR results of genomic DNA 
from B6 transgenic mouse lines Tg25L9-19 and Tg25L9-46, which carry 
the CAST/Ei BAC. Polymorphic markers, which differentiate between 
C57BL/6J and CAST/Ei were used as shown. n = 5 biological replicates. 

j, Amino acid sequence comparison of ANKRD16 from various species 
with the C57BL/6J strain shown. Non-synonymous SNPs distinguishing 
CAST/Ei and C57BL/6] are shown in yellow and serinylated lysines are 
shown in red. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


@_B6.Tg-CAG Ankrd16-myc 


ARTICLE 


1 66 102 135 167 200 233 268 306 361 393aa 
Peptide Spectral Sequence ye 
count count coverage [%] Description Myc - Aars + + = = = = 
Bait protein Myecinars cee oS 7 eee ~ 7 
82 152 69.0 Ankyrin repeat containing protein 16 (Ankrd16) vHichiank Myc - AAD Aars . . - . + + 
™ at 52.9 Alanyl-tRNA synthetase (Aars) interaction candidate Ankrd16 - Flag a a ea 
9 9 34.0 Fructose-1,6-biphosphatase 1 (Fbp1) (115) empty - Flag + - + - + - 
9 26.9 Estradiol 17 beta-dehydrogenase 5 (Akr1c6) 150kDA + 
r 10 18.0 Tropomyosin alpha 1 chain (Tpm1) 100kDA J -2 = IP: anti - Flag 
75kDA 4 : ba IB: anti - myc 
7 9 18.0 Tropomyosin beta chain (Tpm2) 
7 % 18.8 Protein disulfide-isomerase A3 (Pdia3) 150 KDA ] 
r ir 8.3 Myosin-1b (Myo1b) 100 kDA + Input 
75kDA 4 _—_— a j= 
6 6 10.0 Ubiquitin carboxyl-terminal hydrolase 4 (Usp4) “Low rank” IBranti=mye 
interaction candidate 
5 6 13.8 GlutathioneS-transferase Mu 3 (Gstm3) (14/15) 
Input 
5 5 21.4 Glutamine synthetase (Glul) 7)~, = — ie: anti-Flag 
5 5 15.9 Acyl-CoA dehydrogenase (Acadm) 
HEK293 cells 
5 5 15.0 Cytochrome P450 2A4 (Cyp2a4) 
5 5 148 Sorbitol dehyrogenase (Sord) 
5 5 10.8 Heat shock protein HSP90-alpha (Hsp90aa1) 
5 5 10.2 Alpha-aminoadipic semialdehyde synthase (Aass) 
c d Ankrd16 - Flag + fbf } «= 
Ankrd16 - Flag 2 a i Ankrd29 - Flag e = 4 
HA - Aars = + = = HA-Aars - + . + 
HA - Aars A734€ o - 6 of ANKRD16 (Mus Musculus) HA-empty i Fe 
HA - empty + - Sl 50 kDA 
SOIR =) ; | Tank | ANk | ANK | ank | ANK | ANK [ANK | ANK | ANK | | aya IP: anti-HA 
37kDA | — IP:anti- HA IB: anti - Flag 
IB: anti - Flag 1 66 102 135 167 200-233-268 306 361 aa 25 kDA 
25kDA + 150 kDA 
150 kDA 100 DA Input 
io, | — oe oa ,__ ANKRD29 (Mus Musculus) 75 kDA 1B:anti-HA 
75kDA 4 : 2 . 
50 kDA 
SOkKDA + = LLank Tank [an [Tank | ANK [ang [ak | ANK | | 37 KDA Input 
37kDA | oo \eand: FSR 1 4s 78 Ww 144177, 202d 309 aa 25 kDA ibtanti= Flag: 
25kDA 1 HEK293 cells 
HEK293 cells 
e 
AlaRS full length (FL) (human, 
oth EY ) His - FLAlaRS ef ta - + a 3 2 
Aminoacylation Domain (AAD) —_ Editing Domain (ED) C - terminus His - AAD AlaRS - + = - ae = = 
[cr [emacoonn [AMAR = - : = 
His - CD AlaRS = = ee S = S + 
1 261 455 592 755 968 aa 
GST - ANKRD16 + ate OSE 
AAD AlaRS 
Aminoacylation Domain (AAD) 125 kDA 125 KDA : ae 
[as [emai ooen ? 
72kDA 72kDA = > 
1 261 455 aa ae 
57 KDA 57 KDA ~ - 1B: anti - His 
A AAD AlaRS : % 
42kDA 42kDA — 
Editing Domain (ED) C-terminus 31kDA aitoal =< 
To } {| d 24KDA 24K0A = 
CD AlaRS H $2 1s wees IP: anti-GST S| ip: anti-Gst 
Catalytic Domain (CD) IB: anti - His ¢ 
— Input 
(purified protein) 
1 261 aa 
Koy (Mts) Kopp (S*) K, (uM) 
AlaRS / ANKRD16 53104240 0.0105+0.0001 1.98+0.09 
AlaRS (A734E) / ANKRD16 39854180 0.0076+0.0003 1.90+0.12 
AlaRS (1-455) /ANKRD16 36604140 0.0087+0.0002 2.37+0.10 
AlaRS (A734E)/ANKRD16 (3xR) 19154160 0.0103+0.0002 5.41+0.46 


Extended Data Fig. 2 | See next page for caption. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


ARTICLE 


Extended Data Fig. 2 | Verification of the interaction between 
ANKRD16 and AlaRS in vitro. a, Peptide/spectral counts of proteins 
co-immunoprecipitated from transgenic Ankrd16-Myc (see diagram) but 
not detected in non-transgenic liver tissue. n = 1 experiment. b, HEK293T 
cells were transiently co-transfected with Myc-tagged constructs for 
mouse Aars, Aars47*4®, the Aars aminoacylation domain (AAD) and Flag- 
tagged Ankrd16. Co-IP experiments were performed with ANKRD16-Flag 
as the bait protein. n =3 independent experiments. c, Reciprocal co-IP 
experiments were performed by transiently co-transfecting HEK293T 
cells with HA-tagged constructs for mouse Aars or Aars‘”**" and the Flag- 
tagged Ankrd16. HA-AlaRS proteins were used as bait for pull down. n= 3 
independent experiments. d, Domain structure of mouse ANKRD16 and 


ANKRD29. HEK293T cells were transiently transfected with Flag-tagged 
constructs for mouse Ankrd16, Ankrd29 and the HA-epitope tagged Aars. 
Co-IP experiments were performed with HA-AlaRS as bait protein. n =2 
independent experiments. e, Various domain protein products of 

AlaRS (human) as indicated were bacterially expressed, purified, and 
incubated with GST-ANKRD16. GST-pull down products and input 
were immunoblotted with anti-His or anti-GST antibodies. Asterisks 
indicate protein degradation products. n =2 independent experiments. 

f, Binding dynamics were determined between mouse wild-type or mutant 
AlaRS and mouse wild-type or mutant ANKRD 16 using switchSENSE 
(mean +s.d.; n= 3). 
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Extended Data Fig. 3 | See next page for caption. 
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Extended Data Fig. 3 | Analysis of the effects of ANKRD16 on steps 

of translation. a, Preassembled ternary complex containing either 
Ser-tRNA4" or Ala-tRNA4" was mixed with 70S initiation complex 
programmed with codon GCU in the A site, aliquots were transferred at 
various times to a quenching solution (0.5 M KOH), and products were 
resolved by eTLC. n= 1 experiment. b, Incubation of full-length AlaRS 
with preassembled EF-Tu-GTP-Ser-tRNA“" prevents fMet-Ser formation. 
EF-Tu-GTP-Ser-tRNA“" ternary complex and 70S initiation complex 
were each preassembled. In reaction scheme 1, AlaRS was incubated 

with EF-Tu-GTP-Ser-tRNA“" for 15 min, 70S initiation complex was 
added, and aliquots were removed at various times for eTLC analysis. 

By contrast, in reaction scheme 2, AlaRS was incubated with the 70S 
initiation complex for 15 min, followed by addition of EF-Tu-GTP-Ser- 
tRNA“, and aliquots were removed at various times for eTLC analysis. 
n=2 independent experiments. c, Deacylated tRNA" was mixed with 
AlaRS, serine, ATP and all other components to form the ternary complex, 
aliquots were transferred at various time points to the 70S initiation 
complex, and dipeptide products were resolved by eTLC. t=0 indicates 


control reactions in the absence of AlaRS; * indicates oxidized fMet. Note, 
without alanine supplementation (b and c), trace amounts of fMet-Ala 
are detected, probably due to AlaRS-bound alanyl-AMP during protein 
purification. n = 2 independent experiments. d, Deacylation of [*H]Ser- 
tRNA4A" by mouse wild-type AlaRS or AlaRS(A734E) in the presence or 
in the absence of ANKRD16 (mean + s.d., n = 2, one-phase decay model. 
R? values are as follows: wild-type AlaRS, 0.9892; AlaRS(A734E), 0.991; 
wild-type AlaRS + ANKRD 16 = 0.9872; AlaRS(A734E) + ANKRD16, 
0.9902; ANKRD16, 0.7992). e, Percentage of EF-Tu retained on 

the filter membrane upon the addition of various components as 
indicated (mean + s.d.; n = 2). f, ATP-pyrophosphate exchange by 
mouse AlaRS(A734E) in the presence or in the absence of ANKRD16 
(mean +s.d., n= 2, Michaelis-Menten model. R? values: AlaRS(A734E), 
0.9763; AlaRS(A734E) + ANKRD 16, 0.9874. g, Aminoacylation of 
tRNA4" with alanine by mouse AlaRS(A734E) in the presence or in the 
absence of ANKRD16 (mean +s.d., n = 2, Michaelis-Menten model. R? 
values: AlaRS(A734E), 0.9879; AlaRS(A734E) + ANKRD16, 0.9846. 
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Extended Data Fig. 4 | See next page for caption. 
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Extended Data Fig. 4 | Analysis of serinylation of ANKRD16. a, tRNA- 
independent ATPase activity of mouse wild-type AlaRS for serine or 
alanine (mean +s.d., n = 3, Michaelis-Menten model. R? for serine, 
0.9926; for alanine, 0.8430. b, tRNA-independent ATPase activity of mouse 
wild-type AlaRS or AlaRS(A734E) for serine in the presence of ANKRD16, 
ANKRD29, or ANKRD16(3 xR) (mean +s.d., Michaelis-Menten model. 
R? values are as follows: wild-type AlaRS, 0.9926; AlaRS(A734E), 0.9899; 
AlaRS(A734E) + ANKRD16, 0.9918; AlaRS(A734E) + ANKRD29, 

0.9939; AlaRS(A734E) + ANKRD16(3 xR), 0.9841. For wild-type 

AlaRS, AlaRS(A734E) and AlaRS(A734E) + ANKRD29, n=3; for 
AlaRS(A734E) + ANKRD16 and AlaRS(A734E) + ANKRD16(3xR), n=4. 
c, TLC analysis of ATPase activity of AlaRS(A734E) against serine in the 
absence and in the presence of ANKRD16. n =3 independent experiments. 
d, Experimental scheme showing how data were generated for e, f, g 

and h. e, Acylation reactions with radioactive alanine were performed 

as described above to determine alanine transfer (mean +s.d., n=2, 
Michaelis-Menten model. R? values: AlaRS(A734E) + tRNA + ANKRD16, 
0.9611; AlaRS(A734E) + tRNA + buffer, 0.9641. f, Misacylation of tRNA‘? 
with radioactive serine by mouse AlaRS(A734E) in the presence or in 

the absence (buffer) of ANKRD16. After misacylation, reactions were 


subjected to TCA precipitation of RNA (Ser-tRNA“") and protein under 
acidic conditions to maintain Ser-tRNA (mean +s.d., n= 2, Michaelis— 
Menten model. R? values AlaRS(A734E) + tRNA + ANKRD 16, 0.9577; 
AlaRS(A734E) + tRNA + buffer, 0.7841. g, Misacylation reactions using 
radioactive serine and mouse wild-type AlaRS or AlaRS(A734E) were 
performed in the presence or in the absence of ANKRD16, ANKRD29, 
ANKRD16(3xR) or tRNA4A™. After TCA precipitation under neutral pH 
conditions, serine was measured (mean + s.d., n =4). Note the higher 
level of TCA-precipitated serine on tRNA or protein in the presence 

of ANKRD16. h, Acylation reactions using radioactive alanine and 
mouse AlaRS(A734E) were performed in the presence or in the absence 
of ANKRD16 or tRNA4A™. After TCA precipitation under neutral pH 
conditions, alanine was measured (mean +s.d., n=4). i, Misacylation 
reactions using radioactive serine and mouse AlaRS(A734E) were 
performed in the presence of ANKRD 16 either with or without tRNA“. 
Reactions were treated with or without Na,CO3 (final concentration 

of 0.15 M, alkaline pH) for 30 min, followed by TCA precipitation. 
Precipitated [*H]serine-links were measured and plotted as relative level of 
serinylation (mean, n= 2). 
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Extended Data Fig. 5 | See next page for caption. 
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Extended Data Fig. 5 | Analysis of serinylation of ANKRD16 by 
mass spectrometry. a-d, MS/MS spectra of peptides from ANKRD16. 
Incorporation of serine onto ANKRD16 was observed when serine 
was misactivated by mouse AlaRS(A734E) (+ATP). Misactivation 

was not observed in the absence of ATP. MS/MS spectrum of peptides 
from ANKRD16 with serine linked to positions K102 (b), K135 (c) and 
K165 (d). a ions (a), b ions (b) and y ions (y) are annotated in green, 
red and purple, respectively. The triply charged precursor had a mass 
of 2,153.065 daltons and included carbamidomethy] cysteine. For a—d, 
n=1 experiment. e, Secondary structure analysis of mouse ANKRD16 
and ANKRD16(3 xR). Far-UV circular dichroism spectra of wild-type 
ANKRDI16 (blue) and ANKRD16(3 x R) (red) show highly similar CD 


spectra (mean + s.d., 1 =4). f, Thermal-shift analysis of mouse ANKRD16 
(blue) and ANKRD16(3 xR) (red) show highly similar thermal stability. 
n=2 independent experiments. g, HEK293T cells were transiently 
co-transfected with Myc-tagged constructs for mouse AlaRS(A734E) 

and Flag-tagged ANKRD16, ANKRD16(3 xR) or ANKRD29. Co-IP 
experiments were performed with Flag-tagged proteins as the bait protein. 
Binding affinity was determined by normalizing the immunoprecipitation 
signal of AlaRS(A734E) to the input signal, in which the interaction of 
AlaRS(A734E) + ANKRD16 was arbitrarily defined as 1 to determine 

the relative binding affinity of AlaRS(A734E) + ANKRD16(3 xR) 

(mean +s.d., n= 3). 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


ARTICLE ff 


Ankrd16 - Flag Ankrd16 28 - Flag Ankrd16 - Flag Ankrd16 ®® - Flag 
Aars sti/sti Aars sti/sti Aars sti/sti 


= 


Hoechst ; -_ [Hoechst 
0.4 mM serine 40 mM serine 
Extended Data Fig. 6 | Serine-induced cell death in Aars*“" fibroblasts. and cells were cultured for 24h before staining with propidium iodide (PI; 
B6.Aars‘!sti embryonic fibroblasts were co-transfected with hrGFP red) and Hoechst (blue) to determine cell death. Arrowheads represent 
(humanized recombinant GFP, green) and either ANKRD16-Flag PI*+GFP* cells. n = 4 biological replicates. Scale bars, 100 1m. 


or ANKRD16(3xR)-Flag (n = 4). 12h post-transfection, serine was added 
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Extended Data Fig. 7 | Analysis of ubiquitous deletion of ANKRD16. 
a, Protein levels of ANKRD16 were determined by western blotting of 
tissues from B6 mice homozygous for the Msti region (MstiCS1/CAS?), 
n= 3 biological replicates. b, Subcellular analysis of ANKRD16 and 
AlaRS by cell fractionation of brains from B6.Msti©4S"/®° and Ankrd16~'~ 
mice. Cellular fractions were confirmed using antibodies for histone 3 
(nuclear marker), GAPDH (cytosolic marker), COX IV (mitochondrial 
marker), GRP78 (endoplasmic reticulum marker), and Sec61 beta (rough 
endoplasmic reticulum marker). n = 2 independent experiments. c, To 
generate a ubiquitous or conditional loss-of-function allele, loxP sites that 
flank exon 2 of Ankrd16 were inserted by homologous recombination. 
Removal of exon 2 results in a frame shift and premature stop codon 

in exon 3. Ankrd16 was ubiquitously deleted by Ella Cre-mediated 


ARTICLE 


removal of exon 2 and the neo cassette. Flippase (Flp)-mediated excision 
of the neo cassette was used to generate a conditional loss-of-function 
Ankrd16 allele. d, Loss of ANKRD 16 was verified by western blotting. 
Protein extracts from Ankrd16~/~, Msti©4S™/CAST and C57BL/6J mice 
were used for comparison purposes and GAPDH was used as a loading 
control. n= 4 biological replicates. e, The number of embryos or mice 
of various genotypes from intercrosses of Aars‘* Ankrd16~'~ mice over 
the total number observed. Representative images of E9.5 embryos. 
Note, Aars*/“Ankrd16~/~ embryos are smaller and have failed to turn. 
E, embryonic day; P, postnatal day. For E8.5 and E9.5, n= 4 litters for 
each time point; E10.5 and E12.5, n=2 litters for each time point; P21, 
n=7 litters. Scale bars, 500 um. 
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Extended Data Fig. 8 | Conditional deletion of Ankrd16 accelerates 
Purkinje cell loss and causes widespread neurodegeneration in the 
B6.Aars*/# cerebellum. a, b, Calbindin D-28 immunohistochemistry 
of sagittal cerebellar sections. c, Cresyl violet-stained sagittal cerebellar 
sections. Note the presence of interneurons in the molecular layer (ML) 
of B6.Aars*!* cerebellum despite the thinning of the molecular layer as 

a consequence of Purkinje cell degeneration. By contrast, loss of Ankrd16 


En1°: Ankrd16™ ; 


En1°°: Ankrd16"-; VV Aars stist 


Aars *tist 


Aars sist 


_ Ankrd16~ 


ML 


PL 


GL 


in the B6.Aars*“/" cerebellum results in degeneration of molecular-layer 
interneurons. GL, granule cell layer; PL, Purkinje cell layer. The number 
of biological replicates are as follows: for TgPcp2-Cre Ankrd16~Aars*!st 
or En1°*Ankrd16”~ Aars**". P7, n=5; P18, n=4; P21, n=4; P28, n=7; 
P30, n=5; P42, n=4; 3 months, n=5; 7 months, n=6. For Ankrd16~'~: 
7 months, n= 7. For Aars*/*" and wild-type: 7 months, n= 3. Scale bars, 
500m (a, b, c), 50 xm (c, higher magnification). 
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Extended Data Fig. 9 | See next page for caption. 
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Extended Data Fig. 9 | Conditional deletion of Ankrd16 accelerates the 
formation of protein aggregates in B6.Aars™/*“ mice. a, Ubiquitin (Ub; 
red), p62 (green), and Calbindin D-28 (Calb; blue) immunofluorescence 
on sagittal cerebellar sections. The percentage of aggregate-positive 
Purkinje cells and Purkinje cells are shown (mean +s.d., n= 3, 

multiple t-tests (Holm-Sidak method), *** P=0.0002466 (Purkinje 
cells with aggregates, P21), P=0.0002336 (Purkinje cells with 
aggregates, P28), P=0.0003214 (percentage of Purkinje cells, P21), 
#4 P — 7.701787 x 10° (percentage of Purkinje cells, P28)). Note that 
the percentage of Purkinje cells is relative to control C57BL/6J mice. 

b, Cell type-specific markers (red) and p62 (green) immunofluorescence 


on sagittal cerebellar sections. Parvalbumin (Parv) was used to identify 
Purkinje cells and interneurons (stellate and basket cells) in the molecular 
layer. NeuN was used to distinguish between granule and Golgi cells in 
the granule cell layer. En1° Ankrd16~ Aars**"’, Golgi cell (closed arrow 
head, p62+NeuN_ ); granule cell (arrow, p62+/NeuN7); basket or stellate 
cell (open arrow head, p62*/Parvt). n =3 biological replicates. 

c, Ubiquitin (red) and p62 (green) immunofluorescence on sagittal 
sections of the cortex (layer IV). n = 3 biological replicates. Scale bars, 
10pm (a), 501m (b and ¢, low magnification) and 101m (b and ¢, higher 
magnification). 
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Extended Data Fig. 10 | Model for the role of ANKRD16 in translational _ protein aggregation and cell death. b, Interaction of ANKRD16 with the 


fidelity. a, A point mutation in the editing domain of AlaRS(A734E) aminoacylation domain of editing-deficient AlaRS(A734E) stimulates 
results in editing defects, as indicated by deficits in tRNA-independent tRNA-independent editing, and misactivated serines are transferred onto 
ATPase activity, which in turn leads to increased levels of incorrectly ANKRDI16. Mitigation of serine misactivation prevents sti-mediated 


aminoacylated Ser-tRNA4", misincorporation of serine during translation, —_ mistranslation, and thereby prevents protein aggregation and cell death. 
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SWI2/SNF2 ATPase CHR2 remodels 
pri-miRNAs via Serrate to impede 


miRNA production 


Zhiye Wang!?, Zeyang Ma!5, Claudia Castillo-Gonzdlez!?°, Di Sun!, Yanjun Li+?,?, Bin Yu*, Baoyu Zhao!, Pingwei Li! & 


Xiuren Zhang!* 


Chromatin remodelling factors (CHRs) typically function to alter chromatin structure. CHRs also reside in 
ribonucleoprotein complexes, but little is known about their RNA-related functions. Here we show that CHR? (also 
known as BRM), the ATPase subunit of the large switch/sucrose non-fermentable (SWI/SNF) complex, is a partner of 
the Microprocessor component Serrate (SE). CHR2 promotes the transcription of primary microRNA precursors (pri- 
miRNAs) while repressing miRNA accumulation in vivo. Direct interaction with SE is required for post-transcriptional 
inhibition of miRNA accumulation by CHR2 but not for its transcriptional activity. CHR2 can directly bind to and unwind 
pri-miRNAs and inhibit their processing, and this inhibition requires the remodelling and helicase activity of CHR2 in 
vitro and in vivo. Furthermore, the secondary structures of pri-miRNAs differed between wild-type Arabidopsis thaliana 
and chr2 mutants. We conclude that CHR2 accesses pri-miRNAs through SE and remodels their secondary structures, 
preventing downstream processing by DCL1 and HYLI. Our study uncovers pri-miRNAs as a substrate of CHR2, and an 
additional regulatory layer upstream of Microprocessor activity to control miRNA accumulation. 


CHR2 is the ATPase subunit of the large SWI/SNF chromatin-remodelling 
complex. Such complexes are known to remodel chromatin structures 
by nucleosome sliding, eviction, or histone variant exchange using 
energy derived from ATP hydrolysis'. Animal CHR2 can also associate 
with nascent mRNA ribonucleoprotein complexes (pre-mRNPs)*”. 
The yeast chromatin remodeller ISW1 is involved in the surveillance 
of nuclear mRNP biogenesis*. Some SWI/SNF members can also 
bind to long-noncoding RNAs (IncRNAs) and participate in assembly 
of IncRNA-dependent nuclear bodies or regulation of chromatin 
association® *. These studies suggest that CHR2 and other SWI/SNF 
subunits have additional roles in RNA biology. Notably, these roles do 
not require the remodelling activity of the relevant ATPases. Whether 
and how SWI/SNF factors participate in post-transcriptional processing 
of RNA is unknown. 

MicroRNAs, a large family of small non-coding regulatory RNAs’, 
are processed from long pri-miRNAs that contain a hairpin-like fold- 
back by Microprocessor and Dicing complexes!™!!. The minimal plant 
Microprocessor—Dicing complex includes Dicer-like 1 (DCL1), and 
a double-stranded (ds)RNA-binding protein, Hyponastic leaves 1 
(HYL1)”. The Microprocessor initially cleaves basal flanking segments 
of pri-miRNAs to generate precursor miRNAs (pre-miRNAs), and 
subsequently cuts pre-miRNAs to produce miRNA/* duplexes (where 
the asterisk represents a miRNA complementary strand)!%. Although 
miRNAs are derived from pri-miRNAs, transcript levels of pri-miRNAs 
do not always correlate with miRNA abundance, possibly owing to 
various post-transcriptional regulations and processing efficiencies of 
the RNA species!#!#}5, 

SE, a zinc-finger protein, has also been considered to be a core 
component of plant Microprocessor, because se mutations cause 
pri-miRNA accumulation and miRNA loss in vivo'®. Although some 
have argued for a direct role for SE in promoting the enzymatic activity 


and accuracy of DCL1'”'§, processing of pri-miRNAs by DCL] in vitro 
does not seem to require SE”; rather, SE might act as a scaffold to 
recruit the processing machinery including DCL1 and HYL1 to proper 
RNA substrates to produce miRNAs in vivo!*!920_ Ars2, the mammalian 
orthologue of SE, also participates in miRNA-dependent silencing, 
suggesting that the function of SE has been conserved throughout 
eukaryotes”!””. However, the mechanism by which SE contributes to 
miRNA production has been unclear. 


CHR2 has two roles in miRNA accumulation 

We identified CHR2 as a bona fide partner of SE (Fig. la, b, Extended 
Data Fig. la-d, Supplementary Information). We then investigated 
whether CHR2 participated in SE-mediated miRNA production. 
Adult chr2-1 mutant plants exhibited short stature and downward 
curled leaves, whereas se mutants had small serrated and upward 
curled leaves (Extended Data Fig. le, f). Small RNA (sRNA) blot assays 
showed that the amounts of the tested miRNAs, except miR390, were 
greatly increased in leaves and flowers of chr2-1~/~ compared to wild- 
type (Col-0) plants or their heterozygote siblings (chr2-1+/~) (Fig. lc, 
Extended Data Fig. 1g—i). Moreover, both morphological and miRNA 
abnormalities in chr2-1 plants were fully complemented by a PCHR2- 
gCHR2-Flag-4Myc (PCHR2-gCHR2-FM) transgene, showing that the 
lack of CHR2 causes both morphological and miRNA abnormalities 
(Extended Data Fig. 1j, k). 

We compared miRNA profiles in chr2-1 and Col-0 plants through 
deep-sequencing analysis. We normalized sequencing read counts to 
miR390, which consistently showed no changes between chr2-1 and 
Col-0 plants in either sRNA blots or sequencing (see Methods). We 
found that 114 of 365 annotated miRNAs were upregulated at least 
1.5-fold in chr2-1 relative to Col-0 plants (Fig. 1d, Supplementary 
Table 1). We focused on the 259 canonical SE-dependent miRNAs and 
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Fig. 1 | CHR2 is a novel partner of SE and represses miRNA 
accumulation. a, b, Y2H (a) and co-immunoprecipitation (IP) (b) assays 
show the specific CHR2-SE interaction. AD, GAL4-activation domain; 
BD, GAL4-DNA-binding domain; AD or BD-protein, AD or BD-fusion 
proteins; CHR2-FM, chr2-1;PCHR2-gCHR2-Flag-4Myc. c, sRNA blot 
analysis shows that miRNAs accumulated in chr2-1 mutant plants. 

d, sRNA sequencing analysis shows that a substantial number of miRNAs 
accumulated in chr2-1 mutants. Compared to Col-0, miRNAs with at 
least 1.5-fold higher (chr2-1/Col-0 > 1.5) or lower (Col-0/chr2-1 > 

1.5) expression in chr2-1 are indicated by orange dots and blue dots, 
respectively. The grey dots indicate differences in expression level < 1.5- 
fold (ratio < 1.5). 


found that approximately one-quarter of these SE-dependent miRNAs, 
representing about 59% of CHR2-repressed miRNAs, were significantly 
enhanced in chr2-1 plants (Extended Data Fig. 11, m, Supplementary 
Table 1). Thus, CHR2 is a previously unrecognized negative regulator 
of miRNA accumulation for a large subset of miRNAs in Arabidopsis. 

We then investigated the transcription levels of miRNA genes (MIRs) 
in chr2 mutants through PMIR-FM-GUS reporter assays and meas- 
urement of nascent pri-miRNA levels. Notably, unlike the effects on 
miRNA accumulation, MIR gene transcription was reduced in chr2 
mutants (Extended Data Fig. 2, Supplementary Table 2, Supplementary 
Information). Thus, CHR2 must have an additional inhibitory role in 
miRNA biogenesis, beyond its canonical function in promoting MIR 
transcription. 


Uncoupled CHR? roles in miRNA biogenesis 

To study whether the functions of CHR2 in miRNA biogenesis depend 
on SE, we mapped the CHR2-SE interaction interface. Using yeast 
two-hybrid (Y2H) assays with dozens of CHR2 mutants, we found 
that CHR2(E1747A) abolished the CHR2-SE interaction (Fig. 2a, 
Extended Data Fig. 3a-e, Supplementary Information). Thus, E1747 of 
CHR2 is essential for the specific CHR2-SE interaction. We then trans- 
formed PCHR2-gCHR2(E1747A)-FM into chr2-1 plants and obtained 
T2 homozygotes. Notably, CHR2(E1747A) only partially rescued the 
morphological and miRNA defects in chr2-1 plants (Fig. 2b, c). This 
result indicated that the direct CHR2-SE interaction is required for 
inhibition of miRNA accumulation in vivo by CHR2. 

Through genetic crossing, we obtained chr2-1;PCHR2-gCHR2 
(E1747A)-FM;PMIR159a-FM-GUS homozygotes. The expression 
level of the GUS reporter was comparable in chr2-1;PCHR2-gCHR2 
(E1747A)-FM and Col-0 backgrounds (Fig. 2d, Extended Data Fig. 3f). 
Thus, the E1747A mutation does not alter the transcriptional function 
of CHR2. This result also suggests that enhanced miRNA accumulation 
in the chr2-1;PCHR2-gCHR2(E1747A)-FM hypomorphic line results 
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Fig. 2 | Uncoupling of transcriptional and post-transcriptional 
regulatory roles of CHR2 in miRNA biogenesis. a, Y2H assay shows that 
CHR2(E1747) is required for its interaction with SE. b, c, CHR2(E1747A) 
partially rescued the morphological and miRNA accumulation defects 
seen in chr2-1 plants. b, Scale bars, 1 cm. ¢, sRNA blot analysis (top) and 
western blot analysis of CHR2 or its variant (bottom). d, Western blot 
analysis of GUS protein from PMIR159a-FM-GUS homozygotes show 
that CHR2(E1747A) does not alter the transcriptional level of MIR159a in 
Col-0 and chr2-1;PCHR2-gCHR2(E1747A)-FM backgrounds (n > 30). 


from compromised inhibition of SE-mediated miRNA biogenesis by 
CHR2(E1747A). Thus, the transcriptional and post-transcriptional reg- 
ulatory roles of CHR2 in miRNA biogenesis can be uncoupled via SE. 


CHR? inhibits pri- miRNA processing in vitro 

CHR2 does not affect the expression of miRNA metabolism compo- 
nents or SE-dependent pre-mRNA splicing (Extended Data Fig. 4, 
Supplementary Table 3, Supplementary Information). To test whether 
CHR2 directly inhibits SE-mediated pri-miRNA processing, we 
established an in vitro Microprocessor system using recombinant DCL1, 
SE, and HYL1. The recombinant Microprocessor (DCL1-HYLI1-SE) 
cleaved pri-miR166f as efficiently and accurately as DCL1 immuno- 
precipitates from plants!” (Extended Data Fig. 5a, b). Thus, the recom- 
binant Microprocessor can recapitulate in vivo miRNA biogenesis. 
Notably, incubation of purified CHR2, but not a control (hSTING)*, 
with pri-miRNAs before application of HYL1-SE substantially 
inhibited Microprocessor activity (Fig. 3a, Extended Data Fig. 5c). 
CHR2-mediated inhibition of Microprocessor activity was concentration- 
dependent and also enhanced by prolonged incubation with 
pri-miRNA (Extended Data Fig. 5d, e). By contrast, treatment with 
CHR2 after pri-miRNA incubation with HYL1-SE only slightly 
inhibited DCL1 activity (Fig. 3b, Extended Data Fig. 5f). Thus, 
Microprocessor, when loaded with pri-miRNAs, could largely 
bypass CHR2 inhibition in vitro. This result also suggests that CHR2 
impedes miRNA biogenesis by acting upstream of Microprocessor 
activity in vivo. 

Microprocessor cleavage activity was also substantially inhibited 
by CHR2(E1747A) that was pre-incubated with pri-miRNAs, but to 
a lesser extent than by CHR2 (Fig. 3c, Extended Data Fig. 5g). This 
result indicates that CHR2-mediated inhibition of Microprocessor 
activity in vitro results from features other than its SE-binding ability. 
By contrast, addition of CHR2(E17474A) to pri-miRNAs that had been 
pre-incubated with SE had only a marginal effect on miRNA produc- 
tion (Fig. 3d, Extended Data Fig. 5h). Thus, if SE binds to pri-miRNA 
before CHR2, obstruction of the CHR2-SE interaction compromises 
CHR2-mediated inhibition of pri-miRNA processing in vitro. Notably, 
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Fig. 3 | In vitro Microprocessor assays show that CHR2 inhibits 
pri-miRNA processing. a, Pre-incubation of CHR2 with pri-miRNAs 
inhibited pri-miRNA processing. b, Pre-incubation of HYL1-SE with 
pri-miRNAs largely bypassed the CHR2-mediated inhibition of pri- 
miRNA processing. c, Pre-incubation of pri-miRNA with CHR2(E1747A) 
substantially reduced pri-miRNA processing. d, CHR2(E1747A) 
marginally inhibited pri-miRNA processing when pri-miRNA was pre- 
incubated with SE. Molar ratio of SE:CHR2 (or CHR2(E1747A)) was 4:1. 
a-d, Left, reconstitution orders; right, quantification of relative cleavage 
efficiency with s.d. from three replicates. 


these results are consistent with the observation that CHR2 requires SE 
to inhibit miRNA accumulation in vivo (Fig. 2c). Thus, we conclude 
that CHR2 obtains pri-miRNAs from SE to fulfil its inhibition of pri- 
miRNA processing. 


CHR2 binds to pri-miRNAs 

CHR2 does not block access of DCL1-HYL] to SE to process pri-miR- 
NAs (Model 1, Extended Data Figs. 6, 7, Supplementary Information). 
Next, we asked whether CHR2 sequestered pri-miRNAs from SE, 
thwarting their handover to DCL1-HYL1 (Model 2, Extended Data 
Fig. 6a). Electrophoretic mobility shift assays (EMSAs) showed that 
CHR2 bound to pri-miRNAs and pre-miRNAs (apparent dissocia- 
tion constant (Kg) = 18.78 + 0.70 nM; Hill coefficient (my) = 3.87) as 
strongly as to double-stranded DNA (dsDNA) (apparent Ky ¥ 16 nM, 
ny = 4.27) (Fig. 4a, Extended Data Fig. 8a-f). The sigmoidal CHR2- 
nucleic acid binding curve (Fig. 4a, Extended Data Fig. 8f) suggests 
cooperativity between multiple nucleic acid binding sites in CHR2”4 
in substrate binding. 

SE and HYLI can also bind to pri-miRNAs'**° with binding affin- 
ities of approximately 50 and 0.7 nM, respectively (Fig. 4b, Extended 
Data Fig. 8g, h). These results indicate that CHR2 binds to pri-miRNAs 
with a higher affinity than SE but with a substantially lower affinity than 
HYLI. The results also suggest that CHR2 has the potential to compete 
with SE, but not with HYLI, for pri-miRNAs. However, EMSA showed 
that co-incubation of CHR2 and SE with pri-miRNAs resulted in a 
shifting pattern different from the mobility of either CHR2-pri-miRNA 
or SE-pri-miRNA complexes (Fig. 4c). These results suggest that 
CHR2, SE, and pri-miRNA can form stable complexes with a distinct 
electrophoretic mobility. Alternatively, CHR2 may alter the pri-miRNA 
structure and thus, the mobility of the SE-pri-miRNA complex. By 
contrast, whenever HYL1 was applied, individually or combined with 
other proteins, the mobility of the pri-miRNA-protein complexes 
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Fig. 4 | CHR2 binds to pri-miRNAs in vitro and in vivo. a, b, The binding 
curves of CHR2 (a), SE and HYLI (b) to pri-miRNA. The (apparent) Ka 
values were calculated from the EMSA image quantification with s.d. from 
three experiments. c, EMSA shows that CHR2, SE and pri-miRNA form 
a ribonucleoprotein complex. d, RIP assay shows that CHR2 binds to pri- 
miRNAs in vivo. The relative signal of pri-miRNAs was calculated with s.d. 
from three biological repeats (*P < 0.05; **P < 0.01; unpaired, two-tailed 
Student’s t-test). 


was always identical to that of HYL1-pri-miRNA alone (Extended 
Data Fig. 8i). This result suggests that HYL1 could readily sequester 
pri-miRNAs from SE and/or CHR2 in vitro, and that CHR2 and SE 
could not preclude pri-miRNA handover to DCL1-HYL1I for process- 
ing. This result is also consistent with the observation that inhibition 
of Microprocessor activity by CHR2 can be largely attenuated when 
pri-miRNAs are pre-incubated with HYL1-SE (Fig. 3b). 

Next, we performed ribonucleoprotein immunoprecipitation (RIP) 
experiments using hyl1-2 chr2-1;PCHR2-gCHR2-FM homozygotes con- 
taining unprocessed pri-miRNAs (Extended Data Fig. 8j). The RIP 
results showed that pri-miR159a, pri-miR159b, pri-miR164b, and pri- 
miR166a, which are the main contributors to their respective mature 
miRNAs, were all significantly enriched in the CHR2 immunopre- 
cipitate, whereas this pattern was not observed for other pri-miRNAs 
(pri-miR164a and pri-miR164c) that do not contribute substantially 
to mature miRNAs (Fig. 4d). Moreover, treatment of the samples with 
RNases before reverse transcription did not yield any signals, suggest- 
ing that the enrichment was specifically from CHR2-bound RNAs 
(Extended Data Fig. 8k). Together, these results validated the theory 
that CHR2 binds to many species of nascent pri-miRNAs in vivo, fur- 
ther suggesting that CHR2 acts on SE-bound pri-miRNAs before their 
handover to DCL1-HYL1 in vivo. 


CHR2 remodels SE-bound pri-miRNAs 

Plants containing the hypomorphic chr2-1a allele with compromised 
ATPase activity retained enhanced miRNA accumulation compared 
to Col-0 plants (Extended Data Fig. 2d-g). Thus, inhibition of miRNA 
accumulation by CHR2 involves its ATPase activity. We next hypoth- 
esized that CHR2 might unwind self-complementary pri-miRNAs 
and alter their hairpin structures (Model 3, Extended Data Fig. 6a). 
To test this hypothesis, we conducted CHR2 remodelling assays with 
32D_labelled but nicked pri-miR166f. CHR2, but not HYL1, clearly 
released a large portion of the 26-nucleotide (nt) RNA fragment from 
the nicked pri-miR166f (Fig. 5a, Extended Data Fig. 9a—d). Therefore, 
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Fig. 5 | CHR2 remodels pri-miRNAs to prevent their processing. 

a, b, Time-course of remodelling and helicase activities of CHR2 and 
variants on pri-miRNAs (a) and dsDNA (b). c, Thin layer chromatography 
(TLC) assays shows ATPase activities of CHR2 and CHR2(G1009A/ 
K1012A) with or without dsDNA or pri-miRNA. d, Right, Microprocessor 
assay shows that CHR2(G1009A/K1012A) did not inhibit pri-miRNA 
processing (right). Left, reconstitution order. In a-d, image quantifications 
were calculated with s.d. from three replicates. e, DMS-primer extension 
assays show nucleotides (red arrows) in pri-miR164b with stronger DMS 


CHR2 does unwind pri-miRNAs, in a way similar to dsDNA (Fig. 5b, 
Extended Data Fig. 9e, f). These results were unexpected, as SWI/SNF 
complexes are mainly thought to lack helicase activity’. 

Similar to other SWI2/SNF2 ATPases”®’, CHR2 had only basal 
ATPase activity in the ground state. Nonetheless, dsDNA and 
pri-miRNA stimulated its ATPase activity by approximately 25-fold 
and 12-fold, respectively (Fig. 5c, Extended Data Fig. 9g-i). The lower 
stimulation of ATPase activity by pri-miRNA relative to dsDNA 
was probably because it harbours mispairings and internal loops. 
Notably, mutations in the CHR2 ATP-binding site (G1009A/K1012A 
or N1392A/R1417A) diminished the dsDNA- and/or pri-miRNA- 
stimulating ATPase activities of CHR2 by threefold (Fig. 5c, Extended 
Data Fig. 2e, 9g-i). Consistently, CHR2(G1009A/K1012A) and the 
CHR2 nucleotide-binding mutant (CHR2(D1355A/R1385A)), but not 
CHR2(E1747A), also strongly reduced the CHR2 remodelling/helicase 
activity (Fig. 5a, b, Extended Data Fig. 9c-f). These results indicate that 
the unwinding of dsDNA and pri-miRNAs by CHR2 is driven by ATP 
hydrolysis. In addition, the CHR2 mutants with alterations in the ATP- 
and nucleotide-binding sites, but not the E1747A mutant, had slightly 
reduced affinity for pri-miRNAs relative to wild-type CHR2 (Extended 
Data Fig. 10a). Thus, the pri-miRNA binding and remodelling activities 
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chr2-1 


activities in hyll-2 chr2-1;PCHR2-gCHR2-FM plants than in hyl1-2 chr2-1a 
plants. f, DMS—MaPseq analysis shows that CHR2 alters nucleotide 
pairings in upper stem and base region of pri-miR159b. Top, average 
mutation frequencies of A and C are plotted along pri-miRNA sequence. 
The regions within green dashed boxes correspond to zoomed-in 
secondary structures modelled from the DMS activities (bottom). Colour- 
coded nucleotides had different DMS activities between Col-0 and chr2-1 
plants. miRNA and * strands are shown in purple and cyan, respectively. 

g, Proposed model for inhibition of pri-miRNA processing by CHR2. 


of CHR2 depend on its ATPase domain, but not on the SE interaction 
interface. 

We further performed Microprocessor assays using the remodelling- 
defective CHR2(G1009A/K1012A). As CHR2 is likely to access 
pri-miRNAs through SE in vivo (Fig. 2c), we pre-incubated SE protein 
with pri-miRNAs before the application of CHR2 or CHR2(G1009A/ 
K1012A). Whereas CHR2 substantially inhibited Microprocessor 
activity, CHR2(G1009A/K1012A) had negligible effects on pri-miRNA 
processing (Fig. 5d, Extended Data Fig. 10b). The loss of inhibition of 
pri-miRNA processing was probably not caused by the slightly reduced 
affinity of CHR2(G1009A/K1012A) for pri-miRNAs compared to wild- 
type CHR2 (apparent Ky + 19 nM and 28 nM, respectively) because 
CHR2 and SE are in the same complex with pri-miRNAs (Fig. 4c). 
Rather, the result indicates that the CHR2 remodelling activity is critical 
for inhibition of pri-miRNA processing in vitro. 

To study how the remodelling mutations affect miRNA accumulation 
in vivo, we created chr2-1;PCHR2-gCHR2(G1009A/K1012A)-FM and 
chr2-1;PCHR2-gCHR2(D1355A/R1385A)-FM transgenic lines. Notably, 
the morphological and miRNA accumulation defects seen in chr2-1 
plants were partially rescued in these transgenic plants (Extended Data 
Fig. 10c-e). We also introduced the PMIR159b-FM-GUS transgene into 
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the chr2-1;PCHR2-gCHR2(G1009A/K1012A)-FM hypomorphic line by 
crossing. Notably, the CHR2 variant reduced MIR gene transcription 
in chr2-1;PCHR2-gCHR2(G1009A/K1012A)-FM;PMIR159b-FM-GUS 
homozygotes (Extended Data Fig. 10f). Thus, the increase in miRNA 
abundance in the partial complementation lines should result from 
compromised post-transcriptional inhibition of pri-miRNA processing 
by CHR2 variants that lack the remodelling function. 


CHR2 alters pri-miRNA folding in vivo 

To determine whether CHR2 remodels pri-miRNAs in vivo, we probed 
RNA secondary structures through a dimethyl sulfide (DMS)-based 
method”S (Extended Data Fig. 1la—c). DMS methylates the base-pairing 
faces of adenosine (A) and cytidine (C) of RNA in loops, bulges, 
mismatches and joining regions, and such methylation precludes the 
reverse transcription reaction. Strikingly, the reactivity of DMS with 
pri-miR164b differed between transgenic plants expressing wild-type 
CHR2 and those expressing CHR2-1a, which has defective ATPase 
activity (Fig. 5e, Extended Data Fig. 2d-g, 11d). Enhanced DMS 
modifications, detected by premature termination of reverse transcrip- 
tion, were observed in the miRNA/* duplex and the proximal upper 
stem of pri-miR164b in CHR2 plants compared to chr2-1a plants. This 
result indicates that there are more unpaired or unprotected nucleotides 
in the duplex region of pri-miR164b in a CHR2 background than in 
the chr2 mutant. 

Finally, we adopted a method of DMS mutational profiling with 
Illumina sequencing (DMS-MaPseq) to target multiple RNAs in 
vivo”? (Extended Data Fig. 11a). DMS—MaPseq uses a special reverse 
transcriptase enzyme from a group II self-splicing intron (TGIRT)*”. 
TGIRT can read through the DMS-modified A and C template bases 
and insert mismatches into the corresponding cDNAs, thereby 
providing signatures for mapping of DMS modifications along reverse 
transcription products”. We targeted 16 pri-miRNAs, and eight were 
successfully and consistently amplified between two biological repli- 
cates. Notably, DMS-MaPseq showed meaningful differences in DMS 
reactivity of the eight pri-miRNAs, but not the control UBQ4 mRNA, 
between Col-0 and chr2-1 plants (Fig. 5f, Extended Data Fig. 1le-h, 12). 
First, pri-miR166a and pri-miR168a tended to form shortened upper 
stems and branched terminal loops in Col-0 plants, whereas they 
displayed extended upper stems and linear terminal loops in chr2-1 
plants (Extended Data Fig. 11g, h). As multi-branched terminal loops 
and shortened upper stems can cause loop-to-base abortive processing 
of pri-miRNAs whereas elongated upper stems promote productive 
processing of pri-miRNAs!, CHR2 might remodel the structures of 
terminal loops and upper stems to control the productive processing 
of pri-miRNAs. Second, modelling of pri-miR168b folding showed that 
the lower stem was increased from 9 nt in Col-0 plants to 16 nt in chr2-1 
plants (Extended Data Fig. 12a). Such an increase places miR168b/* 
an ideal distance of 15-17 nt from a reference single-strand basal loop 
region”, allowing accurate processing of pri-miR168b. Finally, pri- 
miR159 and pri-miR319 are sequentially processed from terminal loop 
to base to eventually release miRNA/**!. In chr2-1 plants, pri-miR159b 
and pri-miR319b had stronger DMS reactivity in the upper stems, sug- 
gesting lesser folding or more unpaired nucleotides in these regions and 
potentially increasing pri-miRNA processing from the terminal loop to 
the lower base (Fig. 5f; Extended Data Fig. 12b). Furthermore, the lower 
stem of pri-miR159b was also altered so that miR159/* was immedi- 
ately adjacent to a bigger and more basal loop in chr2-1 plants (Fig. 5f). 
This conformation is likely to further promote accurate and efficient 
processing of miRNA/* from pre-miRNAs, as both animal Dicer and 
plant DCLI recognize the loop—bulge structures in addition to the 
5! and 3’ ends of pre-miRNAs for accurate processing'***. Thus, CHR2 
alters the in vivo secondary structures of pri-miRNAs or, possibly, their 
association with cellular components in various ways. 


Discussion 
Here we report a non-canonical function of CHR2 in miRNA biogen- 
esis: while CHR2 positively regulates transcription of MIR loci owing 
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to its conventional chromatin remodelling activity, it can also directly 
inhibit post-transcriptional processing of pri-miRNAs. On the basis of 
our results, we propose a model in which CHR2 inhibits miRNA pro- 
duction by obtaining pri-miRNAs from SE and remodelling the RNA 
substrates, rendering them unsuitable for processing by Microprocessor 
(Fig. 5g). Thus, the SWI2/SNF2 ATPase goes beyond its canonical sub- 
strates of nucleosome! and Polycomb complexes*%, and acts directly on 
its new substrate of pri-miRNAs. The biological significance of these 
self-opposing functions in miRNA biogenesis is that CHR2 provides 
a new and balanced regulatory mechanism to stabilize the supply of 
proper substrates upstream of Microprocessor activity. In this scenario, 
CHR2-SE may serve as a specific reader and modifier of sequences and 
structures of pri-miRNAs and fine-tune miRNA biogenesis in vivo. 
This mechanism may also partially explain frequently observed incon- 
sistencies between the expression of MIR genes and the abundance of 
mature miRNAs'*). 

Several cellular components, including transcription factors, can 
increase or decrease Microprocessor activity by modulating or seques- 
tering core or regulatory Microprocessor components**~*’. However, 
CHR2 can act as both a positive and a negative regulator of miRNA 
biogenesis, and the net outcome depends on the balance of two oppo- 
site forces. Thus, CHR2 represents a new paradigm for the regulation of 
miRNA biogenesis by exerting opposing effects in consecutive biolog- 
ical processes (pri-miRNA transcription and processing) through the 
conformation of different protein complexes (SWI/SNF and CHR2- 
SE), acting on different molecular substrates (chromatin and structured 
RNA, respectively). As CHR2 and SE/Ars2 are conserved throughout 
eukaryotes, it would be interesting to learn whether such a mechanism 
exists beyond Arabidopsis. Moreover, whether CHR2 and SE/Ars2 par- 
ticipate in remodelling of pre-mRNPs and IncRNA complexes besides 
pri-miRNAs would also be an exciting topic for future investigation. 
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METHODS 

Plant materials and growth conditions. Arabidopsis thaliana ecotype Columbia 
(Col-0), chr2-1 (SALK_030046), se-1 (CS3257), se-2 (SAIL_44_G12), se-3 
(SALK_083196), hyl1-2 (SALK_064863) and dcl1-9 (CS3828) were used for this 
study. 

For generating complementation and point mutation (partial complementation) 
lines, the heterozygous chr2-1+/~ mutant was transformed with binary vectors of 
PCHR2-gCHR2-FM or its derivatives by the floral dip transformation method™*. 
In T2 transgenic plants, the homozygous chr2-1 background was genotyped by 
PCR using specific primers (Supplementary Table 4); transgenic plants harbouring 
dual-tagged CHR2 and its derivatives were screened by western blot analysis. The 
pleiotropic phenotype of chr2-1 was fully rescued in the T2 progeny of the PCHR2- 
gCHR2-FM transgenic plants, indicating that dual-tagged CHR2 is as functionally 
active as wild-type CHR2. 

Double mutants such as chr2-1la se-3, chr2-1a hyll-2 and chr2-1a dcl1-9 
were obtained by crossing the partial complementation line of chr2-1;PCHR2- 
gCHR2(N1392A/R1417A)-FM #49 (termed chr2-1a) with heterozygous se-3, 
homozygous hyl1-2 and heterozygous dcl1-9 lines respectively. In the F2 genera- 
tion, homozygous double mutants were identified by PCR using primers listed in 
Supplementary Table 4. 

The hyl1-2 chr2-1;PCHR2-gCHR2-FM material was obtained by crossing the 
complementation line of chr2-1;PCHR2-gCHR2-FM #8 with hyl1-2 plants. In the F2 
generation, homozygous hyl1-2 chr2-1;PCHR2-gCHR2-FM plants were identified 
by PCR using primers listed in Supplementary Table 4. 

For GUS histochemical analysis, heterozygous chr2-1*'~ plants were trans- 
formed with the MIR promoter-driven FM-GUS (PMIR159a-FM-GUS, 
PMIR159b-FM-GUS, PMIR164a-FM-GUS, PMIR164b-FM-GUS and PMIR164c- 
FM-GUS vectors) by floral transformation®*. In T2 transgenic plants, homozygous 
chr2-1~'~ plants were identified by phenotypic segregation. chr2-1;PCHR2- 
gCHR2(E1747A)-FM;PMIR159a-FM-GUS and chr2-1;PCHR2-gCHR2(G1009A/ 
K1012A)-FM;PMIR159b-FM-GUS plants were obtained by crossing chr2-1 +p. 
CHR2-gCHR2(E1747A)-FM #7 and chr2-1; PCHR2-gCHR2(G1009A/K1012A)-FM 
#7 lines with chr2-1*/~;PMIR159a-FM-GUS #3 and chr2-1*!~;PMIR159b-FM-GUS 
#5 lines, respectively. In the F2 generations, homozygous double mutants were 
identified by PCR using primers listed in Supplementary Table 4. 

Plants were grown as previously described*’. Wild-type (Col-0), mutants and 

transgenic lines were grown under a 12 h light-12 h dark cycle. Typically, three- 
week-old plants were harvested for various assays including picture-taking, sRNA 
blot and western blot analyses unless specifically mentioned. No statistical methods 
were used to predetermine sample size. Randomizations and blinding design were 
not relevant to this study. 
Vector construction. Most coding DNA sequences (CDSs) and genomic sequences 
were cloned into pENTR/D-TOPO (Thermo Fisher) vectors and confirmed by 
sequencing. The majority of plant binary constructs were made using the Gateway 
system (Thermo Fisher). The primers for all constructs are listed in Supplementary 
Table 4. 

PCHR2-gCHR2-FM-3’UTR and its derivatives were constructed as follows: 
first, three truncations of CHR2 genomic fragment were amplified and cloned 
into pENTR/D vectors to generate pPENTR-gCHR2-Truncation1-3; then, pEN- 
TR-gCHR2-Truncation1 and 2 were digested with Xhol/Ascl and ligated to create 
pENTR/D-gCHR2-Truncation1+2; later, PENTR-gCHR2-Truncation1+2 and 
Truncation 3 were digested with BamHI/Ascl, and ligated to create pENTR/D- 
gCHR2. Second, a fragment of Flag-4Myc (FM) was amplified and inserted 3’ of 
the DC cassette in a pBA002a-FM-DC vector’ to generate pBA002a-FM-DC-FM. 
Subsequently, the 3’ UTR of CHR2 (493 bp) was amplified and cloned into the 
Spel/SacI-digested pBA002a-FM-DC-FM vector to yield pBA002a-FM-DC-FM- 
3/UTR. In parallel, the promoter of CHR2 (1,903 bp) was amplified and cloned into 
the Xbal/EcoRV-digested pBA002a-FM-DC to yield pBA002a-PCHR2-FM-DC. 
The resultant pBA002a-PCHR2-FM-DC vector was digested with Apal/Xbal and 
ligated to the same enzyme-treated pBA002a-FM-DC-FM-3’UTR to produce 
pBA002a-PCHR2-FM-DC-FM-3/UTR. The plasmid was digested with Xbal/Ascl 
to remove the FM tag in front of the DC cassette, blunted by Klenow treatment, 
and self-ligated to create the pBA002a-PCHR2-DC-FM-3’UTR vector. Finally, the 
full length of the CHR2 genomic fragment from a pENTR vector was transferred 
into pBA002a- PCHR2-DC-FM-3/UTR by Gateway attL-attR (LR) recombination 
reaction (ThermoFisher) to yield pBA002a-PCHR2-gCHR2-FM-3’UTR. 

Site-specific mutations were introduced into CHR2 by PCR using pEN- 
TR-gCHR2 truncations as templates. Then the fragments containing point muta- 
tions were swapped into pENTR-gCHR2 using restriction enzyme digestion 
followed by ligation. Finally, the CHR2 genomic fragment containing site-specific 
point mutations was transferred into pBA002a-PCHR2-DC-FM-3/UTR by LR 
reaction to yield pBA002a-PCHR2-gCHR2 mutants-FM-3/UTR. 

PMIR-FM-GUS were constructed as follows: First, approximately 1.8-2.4-kb 
promoters of MIR159a, MIR159b, MIR164a, MIR164b and MIR164c were amplified 
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using pairs of primers listed in Supplementary Table 4, and cloned into pENTR/D 
vectors. Second, pBA002a-FM-DC-EM (as above) was treated with Xbal/AsclI to 
remove the FM tag in front of the DC cassette, blunted by Klenow treatment, and 
self-ligated to produce the pBA002a-DC-FM vector; then, GUS CDS was amplified 
from pBA-GUS and cloned into the Xhol/PacI-digested pBA002a-DC-FM to create 
pBA002a-DC-FM-GUS. Finally, the promoters of MIR genes were transferred into 
pBA002a-DC-FM-GUS to yield PMIR-FM-GUS vectors. 

P35S-FM-SE and PSE-FM-SE were constructed as follows: For P35S-FM-SE, SE 
CDS was transferred into pBA-FM-DC” by LR reaction to yield he P35S-FM-SE 
construct. For PSE-FM-SE, the SE promoter (2,241 bp) was amplified and cloned 
into BamHI/XbalI-digested pBA002a-FM-DC to yield pBA002a-PSE-FM-DC. 
Then the SE CDS was transferred into pBA002a- PSE-FM-DC by LR reaction to 
yield PSE-FM-SE vectors. 

P35S-CHR2-YFP and P35S-SE-CFP were constructed as follows: For P35S- 
CHR2-YEFB, first, two truncations of CHR2 CDS were amplified and cloned into 
pCR-BluntII-TOPO vectors. Next, NotI/Ndel-digested CHR2 truncation 1 and 
Ndel/AsclI-digested CHR2 truncation 2 were ligated into NotI/AsclI-digested pEN- 
TR/D to yield pENTR/D-CHR2. Then, CHR2 CDS was transferred into pBA-DC- 
YFP by LR reaction to create a P35S-CHR2-YFP vector. For P35S-SE-CFP, the SE 
CDS was transferred into pBA-DC-CFP by LR reaction to create P35S-SE-CFP 
vectors. 

GST-CHR2-957-1823aa-6 x His and its variants were constructed as follow: 
CHR2-957-1823aa-6 x His and its derivatives were amplified from pENTR-CHR2 
and pENTR-CHR2 mutants using sets of primers: 5’ forward containing an 
EcoRI digesting site and 3’ reverse primer containing 6 x His tag sequence and 
Sall digestion site, respectively (Supplementary Table 4). Then the PCR products 
were cloned into an EcoRI/Sall-digested pGEX-4T-1 vector to produce pGEX- 
4T-1-GST-CHR2-957-1823aa-6 x His and pGEX-4T- 1-GST-CHR2-957-1823aa 
derivatives-6 x His. 

GST-6 x His-DCL1, 6 x His-SUMO-HYLI and 6 x His-SUMO-SE were con- 
structed as follows: For the GST-6 x His-DCL1 construct, pBA002a-Flag-DCL1 was 
digested with XbaI/Spel and blunted with Klenow treatment. Next, the Flag-DCL1 
blunt fragment was ligated into SmaI-digested pAcGHLT-C (BD Biosciences) to 
yield the pAcGHLT-GST-6 x His-DCL1 construct. For the 6 x His-SUMO-SE 
construct, SE CDS was amplified and ligated into a BamHI/Xbal-digested pET28a- 
Avi-6 x His-SUMO vector® to obtain the pET28a-Avi-6 x His-SUMO-SE construct. 
For the 6 x His-SUMO-HYLI construct, HYL1 CDS was amplified from the pBA- 
Myc-HYLI plasmid” and digested with BamHI/Sacl. The resultant fragment was 
ligated into the BamHI/Sacl-treated pET28a-6 x His-SUMO vector“ to produce 
the pET28a-6 x His-SUMO-HYLI construct. 

Yeast-two-hybrid (Y2H) and three-hybrid (Y3H) vectors were constructed as 
follows: For Y2H vectors, full-length and truncation derivatives of SE, DCL1, HYL1 
and CHR2 or CHR2 variants were cloned into vectors of pGADT7-GW, which 
harbours a GAL4 activation domain (AD), and pGBKT7-GW, which harbours a 
DNA binding domain (BD), using the LR reaction. Similarly, full-length AtPrp40b 
CDS was amplified and cloned into a pENTR/D-TOPO vector and confirmed 
by sequencing. Then the AtPrp40b CDS was cloned into a pGADT7-GW vector 
using the LR reaction. For Y3H vectors, a synthesized DNA fragment containing 
NotI, KpnI, Spel, Ncol, AscI and BglII digestion sites was digested by NotI and 
BglII and cloned into a NotI/BgllI-digested pBridge (Clontech) vector to pro- 
duce pBridge-MCSII. Then the SE CDS amplified from pENTR/D-SE was cloned 
into the pBridge-MCSII vector by EcoRI/BamHI to yield pBridge-BD-SE-MCSII. 
Finally, the CHR2 CDS fragment was obtained from pENTR-CHR2 using NotI/ 
AcsI digestion and cloned into the NotI/AscI-digested pBridge-BD-SE-MCSII to 
producing the pBridge-BD-SE-CHR2 vector. 

Gel filtration chromatography of protein extracts from plants. Nine-day-old 
P35S-FM-SE seedlings were harvested and homogenized in extraction buffer (20 mM 
Tris-HCl pH7.5, 150 mM NaCl, 4 mM MgCh, 75 1M ZnCl, 1% glycerol, 1 pellet 
per 12.5 ml Complete EDTA-free protease inhibitor (Roche), 1 mM PMSF, and 
15 tM MG132). The homogenates were centrifuged twice at 15,000 r.p.m. for 
15 min at 4°C, and the final supernatant was filtered through a 0.2-\1m filter. Next, 
the total protein extracts were injected into an AKTA FPLC system, and the pro- 
teins were fractioned through a Superdex 200 10/300 GL column (GE Healthcare). 
Fractions were collected for western blot analysis of FM-SE protein. The Superdex 
200 column was also calibrated by gel filtration standard (Bio-Rad). 

Immunoprecipitation and mass spectrometry. In brief, nine-day-old wild-type 
and P35S-FM-SE transgenic seedlings were harvested and ground in liquid nitro- 
gen. Total proteins were extracted from 10 g ground powder in 40 ml immuno- 
precipitation buffer (20 mM Tris-HCl pH 7.5, 150 mM NaCl, 4mM MgCh, 75 1M 
ZnCl, 1% glycerol, 1 pellet per 12.5 ml Complete EDTA-free protease inhibitor 
(Roche), 1 mM PMSF, and 15 1M MG132). After being cleared by ultracentrif- 
ugation, the protein complexes were immunoprecipitated using anti-FLAG M2 
magnetic beads (Sigma, Cat#: M8823) at 4°C for 2 h. After incubation, anti-FLAG 
M2 magnetic beads were washed four times with immunoprecipitation buffer for 
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5 min each at 4°C, and protein complexes were then eluted by competitive 3 x Flag 
peptide (100 j1g/ml). The recovered protein complexes were re-suspended in the 
immunoprecipitation buffer and subsequently immunoprecipitated with anti-c- 
Myc-agarose affinity gel (Sigma, Cat#: A7470) at 4°C for 1.5 h. Next, the beads were 
washed four times with immunoprecipitation buffer to remove unspecific-bound 
proteins. Finally, the protein complexes were eluted in elution buffer (5 mM EDTA, 
200 mM NH,OH) for mass spectrometry analysis in the Taplin Mass Spectrometry 
Facility at Harvard Medical School. 

Y2H and Y3H assays. The Matchmaker Gold Y2H system (Clontech) was used for 
Y2H assays. The constructs were co-transformed into the Y2H Gold yeast strain 
(Clontech) and selected on medium lacking leucine, tryptophan, histidine and ade- 
nine for Y2H assays. The pBridge construct (Clontech) was used for Y3H assays. 
The constructs were co-transformed into the Y2H Gold yeast strain (Clontech) 
and selected on medium lacking leucine, tryptophan, histidine and methionine but 
supplemented with 5 mM 3-amino-1,2,4-triazole (3-AT) for Y3H assays. For both 
Y2H and Y3H assays, positive colonies were picked up and dropped on the selective 
plates with 1:5 (for Y2H) or 1:10 (for Y3H) serial dilutions for the picture recording. 
Co-immunoprecipitation. Three-week-old plants on soil were sampled and 
ground in liquid nitrogen. Total proteins were extracted from 1 g ground powder 
in 5 ml immunoprecipitation buffer (40 mM Tris-HCl pH 7.5, 50 mM KCl, 5 mM 
MgCh, 5 mM DTT, 0.2 mM EDTA, 0.2% Triton X-100, 2% glycerol, 1 mM PMSE, 
25 uM MG132, 1 pellet per 10 ml Complete EDTA-free protease inhibitor (Roche)); 
then, protein extracts were immunoprecipitated with the anti-FLAG M2 magnetic 
bead at 4°C for 2 h; for RNase treatment, 250 jl RNase A (1 mg/ml) was added to 
5 ml immunoprecipitation buffer during incubation. After incubation, the beads 
were washed four times with immunoprecipitation buffer at 4°C for 5 min before 
application of SDS loading buffer at 95°C for 10 min. 

Confocal microscopy. Protoplast preparation and transfection were performed 
as described*!. Transfected Arabidopsis thaliana protoplasts were imaged on an 
Olympus FV 1000 confocal microscope. 

Luciferase complementary imaging assay (LCI assay). The CDSs of CHR2 and 
SE were cloned into pCAMBIA-NLuc and pCAMBIA-CLuc, respectively, by the LR 
reaction. Then the constructs were transformed into A. tumefaciens strain GV3101. 
LCl assays were performed as described”. 

sRNA and western blot analyses. sRNA and western blot assays were performed 
as described*®. The sequences of the oligo probes for sRNA blot assays are listed 
in Supplementary Table 4. Western blot assays of CHR2-FM and its variants were 
typically conducted with an anti-Myc antibody throughout the study (Sigma, Cat#: 
C3956). Western blot assays of FM-GUS protein were performed with an anti-Flag 
antibody (Sigma, Cat#: F1804). Other endogenous protein-specific antibodies used 
in the study included anti-actin (Sigma, Cat#: A0480), anti-HYL1 (from B. Yu's 
laboratory), anti-DCL1 (from B. Yu's laboratory) and anti-SE (Agrisera, Cat#: ASO9 
532A). Secondary antibodies were goat-developed anti-rabbit and anti-mouse IgG 
(GE Healthcare, Cat#: NA934 and NA931). 

sRNA sequencing and bioinformatics. Total RNA was prepared from three- 
week-old plants grown on soil using TRI Reagent (Sigma). Construction of sRNA 
libraries, Illumina sequencing and bioinformatic analysis were performed as 
described*’. Until now, 371 miRNAs have been annotated in Arabidopsis, and we 
initially calculated their expression levels (reads per million, RPM) based on total 
reads. This analysis revealed that 33 miRNAs were significantly over-accumulated 
in the chr2-1 mutant. However, the RPM values of numerous miRNAs including 
the founding members (miR159 and miR166) did not match the results that were 
consistently obtained from sRNA blot analysis throughout our experiments. This 
observation suggested that high-throughput sequencing using standard barcodes 
might introduce a bias against many species of miRNAs as reported previously**4, 
To solve this issue, we normalized expression reads of miRNAs based on miR390, 
which consistently showed no changes between chr2-1 and Col-0 plants in both 
sRNA blots and sRNA sequencing assays. 

GUS histochemical analysis. Three-week-old transgenic plants were sampled and 
placed into a sodium phosphate solution containing 2 mM X-Gluc. The plants were 
vacuum infiltrated for 45 min and incubated at 37°C overnight. After removal of 
chlorophyll, pictures of plants were taken with an Olympus SZH10 stereo micro- 
scope. 

RNA extraction, reverse transcription, quantitative real-time PCR (RT-qPCR), 
and semiquantitative PCR. Total RNA was extracted from the three-week-old 
plants on soil using TRI Reagent (Sigma). cDNA synthesis, quantitative real-time 
PCR and semiquantitative PCR were performed as previously described*’. Primers 
are listed in Supplementary Table 4. 

Alternative splicing analysis. Alternative splicing in chr2-1 was examined using 
rMATS software*® 

Gene ontology analysis. Gene ontology analyses were performed with the agriGO 
toolkit®®. 

Expression and purification of recombinant proteins. GST-6 x His-DCL1 
protein was expressed in a baculovirus/insect cell expression system, whereas 


6x His-SUMO-HYLI, 6 x His-SUMO-SE, GST-CHR2-957-1823aa-6 x His and 
its derivatives were expressed in Escherichia coil BL21 (DE3) cells. All protein puri- 
fication was performed at 4°C, and purified proteins were finally frozen by liquid 
nitrogen and stored at —80°C. 

For expression and purification of DCL1, pAcGHLT-GST-6 x His-DCL1 was 
co-transfected with BaculoGold baculovirus DNA (BD Biosciences, Cat # 554740) 
into sf9 insect cells (BD Biosciences Cat# 554738; authenticated by the vendor BD 
Biosciences) to generate recombinant baculovirus according to the manufacturer's 
instructions. The recombinant viruses were amplified for two rounds, and P3 virus 
was collected for large-scale protein expression. P3 virus was added to 2.5 x 10° 
sf9 insect cells per ml for propagation, and insect cells were collected 62 h later. 

The DCLI protein was purified by two-step affinity chromatography (Ni-NTA 
affinity and glutathione S-transferase affinity) followed by gel filtration chroma- 
tography. The cell pellet was re-suspended in lysis buffer (100 mM Tris-HCl pH 
8.0, 300 mM KCl, 2% glycerol, 1 mM 8-mercaptoethanol, 1 mM PMSF, 1% Triton 
X-100, 1 pellet per 50 ml Complete EDTA-free protease inhibitor (Roche)) and dis- 
rupted with a high pressure homogenizer (AVESTIN, Cat#: EF-C3). After centrif- 
ugation and filtration with 0.4-;1m membrane, the cleared lysate was supplemented 
with 20 mM imidazole and loaded on a HisTrap HP column (GE Healthcare, Cat#: 
17-5248-02). The column was washed with 25 ml wash buffer (40 mM Tris-HCl 
pH 8.0, 300 mM KCl, 2% glycerol, 1 mM 8-mercaptoethanol, 1 mM PMSF, 80 mM 
imidazole) and eluted with gradient elution buffer from 80 to 150 mM imidazole 
(40 mM Tris-HCl pH 8.0, 300 mM KCl, 2% glycerol, 1 mM 8-mercaptoethanol, 1 
mM PMSF). The peak fractions containing recombinant GST-6 x His-DCL1 pro- 
teins were pooled and dialysed with 2 litre GST dialysis buffer (40 mM Tris-HCl 
pH 7.5, 300 mM KCl, 2% glycerol, 1 mM 8-mercaptoethanol, 1 mM PMSF, 1 mM 
DTT) at 4°C for 4 h. Then the dialysed fractions were supplemented with 1% 
Triton X-100 and loaded on a GSTrap HP column (GE Healthcare, Cat#: 17-5282- 
01). The column was washed with 25 ml wash buffer (40 mM Tris-HCl pH 7.5, 
300 mM KCl, 2% glycerol, 1 mM B-mercaptoethanol, 1 mM PMSF, 1 mM DTT) 
and eluted with gradient elution buffer from 0 mM to 15 mM reduced glutathione 
(40 mM Tris-HCl pH 8.0, 300 mM KCl, 2% glycerol, 1 mM B-mercaptoethanol, 
1 mM PMSF, 1 mM DTT). The peak fractions were collected and treated with 
thrombin (Calbiochem, Cat#: 605157-1KU) at 4°C overnight to remove the GST- 
6x His tag. The fractions were concentrated by 50 kDa molecular weight cut-off 
(MWCO) centricon (Millipore, Cat#: UFC905024), and loaded to HiLoad 26/600 
Superdex 200 pg column (GE Healthcare). The gel filtration buffer contained 40 
mM Tris-HCl pH 7.5, 300 mM KCl, 5 mM 8-mercaptoethanol and 2 mM DTT. The 
peak fractions containing DCL1 were collected and dialysed with one litre dialysis 
buffer (20 mM Tris-HCl buffer pH 7.5, 40 mM KCl, 2 mM $-mercaptoethanol, 
2 mM DTT, 50% glycerol) 4°C for 6 h. The final purified protein was quantified 
by SDS PAGE and aliquoted for storage at —80°C. 

For expression of recombinant proteins in E.coli., transformed BL21 DE3 cells 
were grown in Terrific Broth (TB) for recombinant CHR2 and its variants or in 
Luria Broth (LB) for recombinant SE and HYL] proteins. Cells were grown at 37°C 
until OD¢00nm=0.6. Expression of recombinant proteins was typically induced with 
0.5 mM IPTG at 16°C overnight. 

For purification of HYLI, the induced bacterial cells were collected and re-sus- 
pended in lysis buffer (40 mM Tris-HCl buffer pH 8.0, 300 mM KCl, 2% glycerol, 
1 mM B-mercaptoethanol, 1 mM PMSF, 1% Triton X-100) and disrupted with a 
high pressure homogenizer (AVESTIN). After centrifugation and filtration with 
0.4-\1m membrane, the cleared lysate was supplemented with 20 mM imidazole 
and loaded onto a HisTrap HP column (GE Healthcare, Cat#: 17-5248-02). The 
column was washed with 25 ml wash buffer (40 mM Tris-HCl pH 8.0, 300 mM KCl, 
2% glycerol, 1 mM 6-mercaptoethanol, 1 mM PMSE, 80 mM imidazole) and eluted 
with gradient elution buffer from 80 to150 mM imidazole (40 mM Tris-HCl pH 
8.0, 300 mM KCl, 2% glycerol, 1 mM 6-mercaptoethanol, 1 mM PMSF). The peak 
fractions containing the recombinant 6 x His-SUMO-HYLI proteins were pooled 
and treated with SUMO protease at 4°C overnight to remove the 6 x His-SUMO 
tag. The fractions were concentrated by 50 kDa molecular weight cut-off (MWCO) 
centricon (Millipore), and loaded onto a HiLoad 26/600 Superdex 200 pg column 
(GE Healthcare). The gel filtration buffer contained 20 mM Tris-HCl pH 7.4, 
50 mM KCl, 2 mM 8-mercaptoethanol and 2 mM DTT. The peak fractions con- 
taining HYL1 were collected and concentrated again by 50 kDa molecular weight 
cut-off (MWCO) centricon (Millipore). The HYL1 protein was supplemented with 
50% glycerol and finally frozen by liquid nitrogen and stored at —80°C. 

For purification of 6 x His-SUMO-SE, the induced bacterial cells were collected 
and re-suspended in lysis buffer (20 mM Tris-HCl buffer pH 8.5, 300 mM KCl, 2% 
glycerol, 1 mM B-mercaptoethanol, 1 mM PMSF, 1% Triton X-100, 1 pellet per 50 ml 
Complete EDTA-free protease inhibitor (Roche)) and disrupted by the high 
pressure homogenizer (AVESTIN). After centrifuge and filtering (0.4 \m filter), 
the cleared lysate was supplemented with 20 mM imidazole and loaded on a 
HisTrap HP column (GE Healthcare, Cat#: 17-5248-02). The column was washed 
with 25 ml wash buffer (20 mM Tris-HCl pH 8.5, 300 mM KCl, 2% glycerol, 1 mM 
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8-mercaptoethanol, 1 mM PMSF, 80 mM imidazole) and eluted with gradient 
elution buffer from 80 to 150mM imidazole (20 mM Tris-HCl pH 8.5, 300 mM 
KCl, 2% glycerol, 1 mM $-mercaptoethanol, 1 mM PMSF). The peak fractions con- 
taining the recombinant 6 x His-SUMO-SE proteins were pooled and concentrated 
by 50 kDa molecular weight cut-off (MWCO) centricon (Millipore), and loaded 
onto a HiLoad 26/600 Superdex 200 pg column (GE Healthcare). The gel filtration 
buffer contained 20 mM Tris-HCl pH 8.5, 300 mM KCl and 5 mM 6-mercaptoeth- 
anol. The peak fractions containing 6 x His-SUMO-SE were collected and dialysed 
with one litre dialysis buffer (20 mM Tris-HCl buffer pH 8.5, 150 mM KCl, 5 mM 
8-mercaptoethanol, 50% glycerol) at 4°C for 6 h. The final purified protein was 
quantified by SDS-PAGE and aliquoted for storage at —80°C. 

For purification of GST-CHR2-957-1823aa-6 x His, the induced bacterial cells 
were collected and re-suspended in lysis buffer (50 mM sodium phosphate buffer 
pH 8.0, 500 mM NaCl, 2% glycerol, 1 mM 8-mercaptoethanol, 1 mM PMSF, 1% 
Triton X-100) and disrupted by the high pressure homogenizer (Avestin). After 
centrifugation and clearing with a 0.4-|1m filter, the cleared lysate was supple- 
mented with 20 mM imadazole and loaded onto a HisTrap HP column (GE 
Healthcare, Cat#: 17-5248-02). The column was washed with 25 ml wash buffer 
containing 80 mM imidazole, followed by 25 ml wash buffer containing 150 mM 
imidazole (50 mM sodium phosphate buffer pH 8.0, 500 mM NaCl, 2% glycerol; 
1 mM B-mercaptoethanol, 1 mM PMSF). Then bound proteins were eluted with 
gradient elution buffer from 150 to 300 mM imidazole (50 mM sodium phosphate 
buffer pH 8.0, 500 mM NaCl, 2% glycerol; 1 mM $-mercaptoethanol, 1 mM PMSF). 
The peak fractions containing the recombinant GST-CHR2-957-1823aa-6 x His 
proteins were pooled and dialysed with 2-1 GST dialysis buffer (50 mM sodium 
phosphate buffer pH 7.6, 500 mM NaCl, 2% glycerol, 1 mM B-mercaptoethanol, 
1 mM PMSF, 1 mM DTT) at 4°C for 4h. Then fractions were supplemented with 
1% Triton X-100 and loaded onto a GSTrap HP column (GE Healthcare, Cat#: 
17-5282-01). The column was washed with 25 ml wash buffer (50 mM sodium 
phosphate buffer pH 7.6, 500 mM NaCl, 2% glycerol, 1 mM 6-mercaptoethanol, 1 
mM PMSF, 1 mM DTT) and eluted with gradient elution buffer from 0 to 15 mM 
reduced glutathione (50 mM sodium phosphate buffer pH 8.0, 500 mM NaCl, 2% 
glycerol, 1 mM 6-mercaptoethanol, 1 mM PMSF, 1 mM DTT). The peak fractions 
were collected and dialysed with one litre dialysis buffer (50 mM sodium phos- 
phate buffer pH8.0, 500 mM NaCl, 5 mM 8-mercaptoethanol, 2 mM DTT, 50% 
glycerol) at 4°C for 6 h. The final purified protein was quantified by SDS-PAGE 
and aliquoted for storage at —80°C. The same purification protocol was applied 
for purification of CHR2 derivatives. 

ATPase assay. The method for the ATPase assay was modified from a previous 
description”’. For Extended Data Fig. 2e, 1 pmol recombinant CHR? or its deriv- 
atives were added into ATP hydrolysis reactions (20 \1l) containing 20 mM sodium 
phosphate buffer pH 7.0, 25 mM KCI, 5 mM MgCh, 2 mM DTT, 500 ng dsDNA, 
0.1 yl y-?P ATP (PerkinElmer, 3,000 Ci/mmol). Note: the final pooled concen- 
tration of NaCl and KCl in the reaction systems was about 75 mM. Reactions were 
incubated at 37°C for 2 h. For Fig. 5c and Extended Data Fig. 9g-i, the reactions 
were carried out in 20 mM Tris-HCl, pH 7.5, 5 mM MgCh, 2 mM DTT, 0.3% 
NP-40, 1 U/l SUPERase-In RNase Inhibitor (for pri-miRNA stimulated ATPase 
assay, Thermo Fisher). The final pooled concentration of NaCl and KC] in the 
reaction systems was ~60 mM, of which 50 mM was from CHR2 protein dialy- 
sis buffer and 10 mM was from dsDNA or pri-miRNA-dissolved buffer. 40 nM 
recombinant CHR2 or its derivatives was pre-incubated with dsDNA fragment 
(the same as in dsDNA EMSA) or with annealed pri-miR166f for 10 min before 
addition of 50 nM +-*?P ATP and 100 nM cold ATP. Reactions were incubated at 
37°C for the indicated time points. Reactions were stopped by addition of 12.5 mM 
EDTA and 2.5 mM cold ATP and being placed on ice. For measurement of the sub- 
strate-regulated ATPase activity, 0.5 j1M dsDNA or 1 |tM pri-miR166f were added 
to the reactions. Liberated phosphate was analysed by thin layer chromatography 
(TLC) and phosphorimaging. The images were quantified with ImageQuant TL 
(GE Healthcare) and the ATP hydrolysis rates of CHR2 and its ATPase mutant 
were calculated using the data of 60 min reaction for Fig. 5c. 

In vitro transcription and 5’ labelling of RNA/DNA. In vitro transcription and 
5! labelling of RNA/DNA substrates including pri-miR166f were performed as 
described’”. The substrate of 5’ labelled dsDNA for EMSA is a PCR fragment 
containing the T7 promoter followed by the pri-miR166f sequence. The 5’ labelled 
ssDNA is a synthesized primer: 5’-(A)o7CCCTATAGTGAGTCGTATTA-3’. 
The substrate of 5’ labelled single-stranded (ss)RNA is an ssRNA (100 nt) in vitro 
transcribed using T7-G3A97_Rev primer as a template. 

In vitro Microprocessor assay. We added 1 pmol recombinant DCL1, 2 pmol 
HYLI, 2 pmol SE, 0.5 pmol CHR2 or its derivatives and 1,000 counts per minute 
(c.p.m.) of RNA substrate to 30 j1l assay buffer containing 20 mM Tris-HCl 
pH7.5, 50 mM KCl, 4 mM MgCl, 1 mM DTT, 5 mM ATP, 1 mM GTP and 
1 U/l SUPERase-In RNase Inhibitor (Thermo Fisher). The final pooled concen- 
tration of NaCl and KCl was ~70 mM, of which ~20 mM salt was from protein 
dialysis buffer and RNA-dissolved buffer. The reconstitution assay was carried 
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out at 37°C. The reactions were stopped by adding 1 volume TBE-Urea sample 
buffer (Bio-Rad), being heated at 95°C for 10 min and then being chilled on ice. 
The DCL1-processed products were fractionated using 10% denaturing polyacryla- 
mide gel and detected overnight with a phosphor imaging plate (GE Healthcare). 
The images were quantified with Gel-Pro Analyzer (Media Cybernetics) 
software. 

Electrophoretic mobility shift assays. Recombinant proteins and labelled RNA 
or DNA were mixed in the EMSA buffer (20 mM Tris-HCl pH 7.5, 2 mM MgCh, 
2mM DTT, 5 mM ATP, 0.3% NP-40, 1 U/l SUPERase-In RNase Inhibitor (for 
RNA EMSA, Thermo Fisher)). The final pooled concentration of NaCl and KCl 
was ~55 mM, of which 50 mM was from protein dialysis buffer and 5 mM from 
DNA- or RNA-dissolved buffer. Mixtures were incubated at room temperature 
for 30 min. Bound complexes were resolved on native 1% agarose gel and visual- 
ized by radiography. The images were quantified with Gel-Pro Analyzer (Media 
Cybernetics) software. The Ky for SE and HYL1 and apparent Ky for CHR2 and its 
variants were calculated using Prism 5 (GraphPad) software. For the modelling pre- 
diction for the CHR2-nucleic acid binding pattern, the original data were analysed 
with GraphPad Prism 5°; specific binding with a Hill slope model was the best fit. 
UV-crosslinking and ribonucleoprotein immunoprecipitation. Three-week-old 
plants grown on soil were irradiated four times with UV at 150 mJ/cm”. The fixed 
samples were ground in liquid nitrogen and nuclei were isolated using a chromatin 
immunoprecipitation (ChIP) protocol as described*’. The nuclei extracts were 
diluted with RIP buffer (40 mM Tris-HCl pH 7.5, 100 mM KCl, 5 mM MgCh, 
5 mM DTT, 0.2% Triton X-100, 2% glycerol, 1 mM PMSE, 25 1M MG132, 1 pellet 
per 10 ml Complete EDTA-free protease inhibitor (Roche) with 10 U/ml TURBO 
DNase (Thermo Fisher)) and immunoprecipitated with an anti-c-Myc agarose 
(Sigma, Cat#: A7470) at 4°C for 4h. Immunoprecipitates were washed three times 
with RIP buffer and once with high salt buffer (40 mM Tris-HCl pH 7.5, 500 mM 
KCl, 5 mM MgCh, 5 mM DTT, 0.2% Triton X-100, 2% glycerol, 1 mM PMSF, 
25 tM MG132, 1 pellet per 10 ml Complete EDTA-free protease inhibitor (Roche)) 
at 4°C for 5 min, followed by two washes with the proteinase K buffer (100 mM 
Tris-HCl pH 7.5, 50 mM NaCl, 10 mM EDTA). The beads were treated with 
Proteinase K (4 mg/ml) in 150 1l Proteinase K buffer at 37°C for 20 min. After 
treatment with Proteinase K, the RNA was extracted using TRI Reagent (Sigma) 
and treated with TURBO DNase (Thermo Fisher). One half of the RNA was 
directly used for RT-qPCR. The other half (as a control) was further treated with 
RNase A (Sigma) and RNase T1 (Thermo Fisher) before RT-qPCR. 

dsDNA and pri-miRNA remodelling/unwinding assays. For the pri- miRNA 
substrate, a 5’ *?P labelled 26-nt RNA fragment was annealed to a truncated strand 
of pri-miR166f to generate a nicked pri-miR166f (Extended Data Fig. 9b). For 
the dsDNA substrate, a 5’ 32P labelled 19-nt ssDNA fragment was annealed toa 
long ssDNA fragment to generate a dsDNA with long 5’ overhang (Extended Data 
Fig. 9e). The sequences of primers used to make dsDNA or pri-miRNA are listed 
in Supplementary Table 4. Recombinant CHR2 or its derivatives and annealed 
dsDNA or pri-miRNA were mixed in the remodelling buffer (20 mM Tris-HCl 
pH 7.5, 5 mM MgCl, 2 mM DTT, 10 mM ATP, 1 mM GTP, 0.3% NP-40, 1 U/l 
SUPERase-In RNase Inhibitor (specific for the pri-miRNA remodelling assay)). 
The final pooled concentration of NaCl and KCl was ~55 mM, of which 50 mM 
was from CHR2 protein dialysis buffer and 5 mM from dsDNA- or RNA-dissolved 
buffer. Before protease K treatment, the reactions were incubated at 37 °C (for 
DNA) or 25°C (for RNA) for the indicated times. The ssDNA and dsDNA or RNA 
were fractionated using 15% (for DNA) and 12% (for RNA) native PAGE. The 
signals were detected by radiography. The images were quantified with ImageQuant 
TL (GE Healthcare). 

In vivo DMS modification. For primer extension assays to probe in vivo RNA 
folding, approximately 3 g three-week-old hyl1-2 chr2-1;PCHR2-gCHR2-FM and 
hyl1-2 chr2-1a plants grown on soil were collected and completely covered in 
20 ml 1x DMS reaction buffer (40 mM HEPES pH 7.5, 100 mM KCl and 0.5 mM 
MgCl) in a 50-ml Corning tube. DMS (Sigma, Cat#: D186309) was added to a 
final concentration of 0.75% as described”*. Mock treatment was performed by 
addition of deionized water. Samples were treated in the DMS reaction buffer or 
mock solution at room temperature under a vacuum condition. Different time 
courses were initially tested; finally, a period of 30 min was found to be the opti- 
mal incubation time for adult plants (Extended Data Fig. 11c). To quench the 
reaction, 5 ml 8-mercaptoethanol was added to a final concentration of 20%, and 
the mixture was incubated for 2 min under vacuum. After washing 4 times with 
50 ml deionized water, the samples were immediately frozen with liquid No and 
ground into powder. 

For target-specific DMS—MaPseq, three-week-old Col-0 and chr2-1 plants 
grown on soil were collected and treated with or without DMS using the same 
conditions as above except for DMS concentration. Several dosages of DMS were 
also tested; eventually 1% DMS was chosen for the assay because plant materials 
turned brown under treatment with 5% and 1.5% DMS and total RNA appeared 
to decay under these conditions. 
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Nuclear RNA extraction. Nuclear RNA extraction was performed as described. 
In brief, 0.3 g ground powder was dissolved in 5 ml lysis buffer (20 mM HEPES 
pH7.5, 20 mM KCl, 2.5 mM MgCl, 25% glycerol, 250 mM sucrose, 5 mM DTT, 
1 U/10ul RNase inhibitor (Thermo Fisher) and 1 x Proteinase inhibitor cocktail 
without EDTA (Roche)). The homogenate was filtered through a double layer of 
Miracloth. The flow-through was centrifuged at 1,500g for 10 min at 4°C. After 
removal of supernatant, the pellet was washed twice with 5 ml nuclear resuspension 
buffer (20 mM HEPES pH 7.5, 2.5 mM MgCh, 25% glycerol, 0.2% Triton X-100, 
20 jl RNase inhibitor (Thermo Fisher)). Then the total nuclear RNA was extracted 
using TRI Reagent (Sigma). 

Primer extension assays for probing in vivo RNA folding. Probing of in vivo RNA 
secondary structure by primer extension assay was performed as described” with 
modifications. For each sample, 2 jg total nuclear RNA was treated with TURBO 
DNase (Thermo Fisher), followed by phenol-chloroform extraction. DNase- 
treated nuclear RNA was mixed with ~200,000 c.p.m. 32p_radiolabelled gene- 
specific primer (Supplementary Table 4). The mixture was precipitated by ethanol 
and resuspended in 10 tl Tris-KCI solution (10 mM Tris-HCl pH 7.5 and 50 mM 
KCl). The solution was heated at 75°C for 3 min, followed by incubation at 35°C 
(for 18S rRNA) or 55°C (pri-miR164b) for 15 min. 10 il reverse transcription 
reaction buffer mixed with 1 mM DTT, 1 mM dNTPs, 1 yl RNase Inhibitor 
(Thermo Fisher) and 1 xl SuperScript III (for 18S rRNA) (Thermo Fisher) or 
SuperScript IV (for pri-miR164b) reverse transcriptase (Thermo Fisher) was 
added. The reaction proceeded for 1 h at 55°C (for 18S rRNA) or 60°C (for 
pri-miR164b). To stop the reaction and hydrolysation of RNA, 2 jl of 2M NaOH 
was added and mixture was heated at 95°C for 10 min. After neutralization by 
5 M HCL, the mixture was phenol-chloroform extracted and precipitated. Then 
the cDNA was resuspended in loading buffer (95% deionized formamide, 0.025% 
bromophenol blue, 0.025% xylene cyanol, 5 mM EDTA and 0.025% SDS) and size 
fractionated on 8% denaturing polyacrylamide gel. Gel image was collected with a 
Typhoon FLA7000 (GE Healthcare) and bands were quantified using ImageQuant 
TL (GE Healthcare). 

Target-specific DMS-MaPseq. Target-specific DMS-MaPseq was performed as 
described”’. For each sample, 5 jpg DNase-treated nuclear RNA was mixed with 
gene-specific RT primers (5 pmol each primer, up to 5 gene-specific primers in 
one reaction, Supplementary Table 4). The mixture was precipitated and resus- 
pended in 10 il Tris-KCl solution (10 mM Tris-HCl pH 7.5 and 50 mM KCl). The 
solution was heated at 75°C for 3 min, followed by incubation at 57°C for 15 min. 
Four microlitres 5 x First-Strand buffer (250 mM Tris-HCl pH 8.3, 375 mM KCl, 
15 mM MgCl,), 1 4110.1 M DTT (prepared freshly), 1 jl RNase inhibitor (Thermo 
Fisher), 1 1 H2O and 1 jl TGIRT-III (Ingex, Cat#: TGIRT50) were added. The 
mixture was then incubated at room temperature for 30 min. Then 2 jl 10 mM 
dNTP was added and the reverse transcription proceeded at 63°C for 2.5 h. The 
reaction was stopped by adding 2 1] of 2.5 M NaOH and heating at 95°C for 3 min. 
After neutralization by 5 M HCl, the mixture was phenol-chloroform extracted 
and applied to an illustra MicroSpin S-200 HR column (GE Healthcare) to remove 
RT primers and nucleotides. The cleaned cDNA was precipitated and resuspended 
in deionized water. Then pri-miRNAs were amplified using KOD hot start DNA 
polymerase (Millipore) with gene-specific primers (Supplementary Table 4). PCR 
bands were gel purified and normalized according to band intensity before library 
construction. 

PCR products were mixed equally and fragmented into 50-200 bp using 
NEBNext dsDNA Fragmentase (NEB) following the manufacturer’s protocol. After 
purification using QIAquick nucleotide removal kit (QIAGEN), the fragments were 
subjected to end repair, adenylation and adaptor ligation using Illumina adapters, 
mainly following the published protocol’’. The fragments were barcoded through 
adaptor ligation. The ligation products were size fractionated on 3% low melting 
agarose gel, and 250-350 bp adaptor-ligated fragments were purified by gel 
excision. Next, the purified barcoded libraries were enriched by 12 cycles of PCR 
using KOD hot start DNA polymerase. Finally, the PCR products were cleaned 
using Agencourt AMPure XP beads (Beckman). The libraries were quantified using 
Agilent TapeStation before sequencing by 150 bp single-end reads on the Illumina 
NextSeq 500 at Texas A&M University. 

Sequencing alignment and analysis. Raw fastq files were initially filtered for a 
quality control that requires a quality score >30 by sickle (https://github.com/ 
ucdavis-bioinformatics/sickle) and trimmed to remove the adaptor sequences 
by Cutadapt’. Clean reads over 35 bp were retained. Reads were aligned using 
Burrows-Wheeler Aligner (BWA)*! allowing 5% mismatches with the settings: bwa 
aln -n 0.05. The mapping results were sorted by Picard tools (http://broadinstitute. 
github.io/picard/) with coordination and the base calls of aligned reads to the 
reference sequence were summarized using samtools (v1.5) mpileup*”. Nucleotide 
mismatches and sequencing depth were extracted using sequenza-utils (v2.1) 
pileup2acgt (https://pypi.python.org/pypi/sequenza-utils). The DMS signal was 
calculated as number of mismatches per sequencing depth for each adenine (A) 
and cytosine (C) nucleotide. The two biological replicates were normalized to the 


identical means of total DMS signal. Then the average DMS signal for each A and 
C nucleotide was calculated. Based on the DMS signal, the secondary structures 
of pri-miRNAs were modelled by RNAfold*’. The threshold was used to separate 
adenine and cytosine bases into paired and unpaired nucleotides to produce the 
best fitting model for our experimental data. The threshold constraint varied 
between 0.01 to 0.03 depending on different pri-miRNAs. DMS signals were colour 
coded on structure models using VARNA™ (http://varna_Iri.fr,). 

Quantification and statistical analysis. The images of Microprocessor activity and 
sRNA blot assays were quantified with Gel-Pro Analyzer (Media Cybernetics). The 
Kg for EMSA assays was calculated using Prism 5 (GraphPad) software. The images 
from primer extension assays, dsDNA and pri-miRNA remodelling/helicase 
assays and ATPase assays were quantified with ImageQuant TL (GE Healthcare). 
Unpaired, two-tailed Student's t-test was performed using the software Excel. All 
statistics are described below unless specifically mentioned in the Figures or Figure 
legends. 

For Figs. 3, 5d, quantification of the Microprocessor cleavage efficiency was 
calculated by the ratio of processed to unprocessed pri-miR166f fragments. The 
relative efficiency was normalized to that of the Microprocessor reaction alone 
where the ratio was arbitrarily set at 1 with s.d. from three replicates. 

For Fig. 4d, the relative signal of pri-miRNAs was normalized initially to the 
input and then to the control immunoprecipitate, where the ratio was arbitrarily 
set to 1 with s.d. from three biological repeats (*P < 0.05; **P < 0.01; unpaired, 
two-tailed Student's t-test). 

For Fig. 5e, f and Extended Data Figs. 11e-h, 12, in the primer extension assays, 
the colour intensity of coded nucleotides along the pri-miR164b illustrates the 
relative DMS reactivity, calculated as the band signal intensity normalized 
to the highest level, which was set arbitrarily to 1.0. DMS (+/—) refers to the 
DMS treated and untreated samples, respectively. In the DMS-MaPseq analysis, 
DMS mutation frequencies of A and C residues (mismatch/total) from targeted 
pri-miRNAs averaged from two biological replicates were plotted along pri-miRNA 
sequence. Position 1 corresponds to the 5’ end of modelled regions. The overall and 
zoomed-in secondary structures of the targeted pri-miRNAs were modelled from 
DMS activity according to RNAfold**. The nucleotides that displayed different 
DMS activity on pri-miRNAs between Col-0 and chr2-1 plants are colour-coded. 

For Extended Data Figs. 4, previously published datasets*>-°* were reanalysed. 
Graph drawing. Graphs with dot plots (individual data points) were drawn using 
GraphPad Prism 7. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. Original data that support the findings of this study are available 
from the corresponding author upon reasonable request. The GEO accession 
numbers for sRNA sequence and DMS-MaPseq are GSE108858 and GSE108857, 
respectively. Source Data for graphs plotted in Figs. 1, 3-5 are available in the online 
version of this paper. Gel source data for Figs. 1, 2, 4, 5, Extended Data Figs. 1-5, 
8-11 are available in Supplementary Fig. 1. 
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Extended Data Fig. 1 | See next page for caption. 
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Extended Data Fig. 1 | CHR2 is a partner of the Microprocessor 
component SE and represses miRNA accumulation in Arabidopsis. 

a, Size-exclusion chromatography of total protein extracts from 

Col-0; P35S- FM-SE transgenic plants shows the FM-SE enrichment 

in macromolecular complexes (>680 kDa). Western blot analysis was 
conducted with an anti-Flag antibody. b, CHR2 peptides recovered from 
proteomic analysis of the SE complex but not a control immunoprecipitation 
from Col-0 plants. SE complexes were isolated from Col-0; P35S-FM-SE 
transgenic plants by two-step affinity purification, and analysed by mass 
spectrometry. c, Confocal microscopic images show co-localization of 
SE-CFP and CHR2-YFP when co-expressed in Arabidopsis protoplasts. 
Top, SE-CFP and CHR2-YFP, individually expressed, serve as control. 
Scale bar, 5 jum. d, Luciferase complementary imaging (LCI) assay 

shows the specific CHR2-SE interaction in Nicotiana benthamiana. 

The infiltration scheme in the leaf shows different combinations of 
constructs fused to either N-terminal (nLuc) or C-terminal (cLuc) regions 
of luciferase. LCI complementation (LUC), bright field, and merged 
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photograph (Merge) are shown. The red arrows and colour bar indicate 
the infiltration positions and the signal intensity, respectively. e, Schematic 
illustration of T-DNA insertions (reverse triangles) or deletion (triangle) 
in CHR2 and SE genes. f, Opposite leaf polarity phenotypes in chr2- 

1 and se loss-of-function mutants. Pictures of 28-day-old plants are 
shown. Scale bar, 1 cm. g, Additional replicate of sRNA blot analysis 
shows enhanced accumulation of miRNAs in the chr2-1 null mutant. 

h, Flower developmental defects in chr2-1. Scale bar, 0.5 mm. i, sRNA 
blot analysis shows that miRNAs accumulated in chr2-1 flowers. j, The 
chr2-1 morphological defect was fully rescued by the PCHR2-gCHR2-FM 
transgene. Scale bar, 1 cm. k, sRNA blot analysis shows that miRNA 
accumulation was restored to wild-type miRNA levels in the chr2-1; 
PCHR2-gCHR2-FM complementation lines (top). Western blot analysis 
of CHR2-FM using an anti-Myc antibody is shown (bottom). 1, sRNA-seq 
revealed that the accumulation of 259 of 365 annotated miRNAs (http:// 
www.mirbase.org) in Arabidopsis is SE-dependent. m, Overlap of SE- 
regulated and CHR2-repressed miRNAs in Arabidopsis. 
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Extended Data Fig. 2 | See next page for caption. 
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Extended Data Fig. 2 | CHR2 plays self-antagonistic roles in 

the regulation of miRNA accumulation at transcriptional and 
posttranscriptional levels. a, Histochemical staining analysis of GUS 
activity shows comparable transcriptional levels of tested MIR genes 

in wild-type (WT, left, big plants) and chr2-1 (right, small plants) 
backgrounds. The promoters of individual MIR genes**~*! were fused 

to the FM-GUS coding sequence and transformed into a chr2-1+/~ 
heterozygote line. Twenty-one-day-old T2 generation plants were used for 
GUS staining. The SAM and leaf primordia of PMIR164a- and PMIR164c 
-FM-GUS transgenic plants are magnified to comparable sizes. Scale bar, 
1 cm. b, c, Western blot analysis shows that protein levels of FM-GUS 
from the native MIR promoters were higher in wild-type (+/+) than in 
chr2-1 (—/—) plants. Western blot assays were done with an anti-Flag 
antibody. Actin serves as a loading control. d, Schematic illustration 

of the conserved functional domains in CHR2 and the locations of the 
point mutations N1392A and R1417A in CHR2-1a (red arrows). e, Thin- 
layer chromatography (TLC) assay shows that the CHR2-1a variant 

has significantly reduced ATPase activity in vitro. ATPase activity was 
conducted with 500 ng dsDNA for 2 h. Bottom, SDS-PAGE of purified 
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CHR2 and CHR2-1a proteins. f, Western blot analysis using an anti-Myc 
antibody shows comparable expression of CHR2 and CHR2-1a in the chr2- 
1 background. Actin serves as a loading control. g, chr2-1 morphological 
and miRNA accumulation defects, analysed by sRNA blot assay, were 
partially rescued in chr2-1a (chr2-1;PCHR2-gCHR2-1a-FM) plants. 

Scale bar, 1 cm. U6 serves as a loading control for sRNA blot analysis. 

In f, g, *shows the same samples as in Extended Data Fig. 10e. 

h, Morphological phenotypes of the hypomorphic chr2-1a mutant and its 
derived double mutants. Photographs of 21-days-old plants are shown. 
Scale bar, 1 cm. i, RT-qPCR analysis of pri-miRNAs in single and double 
mutants relative to Col-0. Total RNA was prepared from a pool of 21-day- 
old plants; EF-1a serves as a loading control. The relative signals of 
pri-miRNAs were normalized to those of wild-type Col-0, where the ratio 
was arbitrarily set to 1 with s.d. calculated from three biological repeats. 
Individual data points are shown. Data significantly different from the 
corresponding controls are indicated (chr2-1a versus Col-0, **P < 0.01; 
chr2-1a hyl1-2 versus hyl1-2, +P < 0.05, ++P < 0.01; unpaired, two-tailed 
Student’s t-test). 
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Extended Data Fig. 3 | Delineation of the CHR2-SE interaction 
interface and identification of CHR2 E1747 as a critical residue for 
the specific CHR2-SE interaction by Y2H assays. a, Rough mapping of 
CHR2 domains that interact with SE. b, Fine mapping of the CHR2-SE 
interaction interface and final identification of CHR2 E1747 asa critical 
residue for the CHR2-SE interaction. Note: part of the image related to 
CHR2 E1747A is also shown in Fig. 2a. AD, GAL4 activation domain; 
BD, GAL4 DNA binding domain. —LT, lacking Leu and Trp; -LTHA, 
lacking Leu, Trp, His and Ade. 1:5 serial dilutions are shown. c, The 
DESRM domain and E1747 residue of CHR2 that are involved in its 
interaction with SE are conserved through plants. Sequence alignment of 
DESRM domains of CHR2 proteins across different species (the region 
is rich in Asp-Glu-Ser residues, and therefore named the DES-rich motif 
(DESRM)). Red star indicates E1747, which is critical for the CHR2—SE 


ARTICLE 


interaction. Conserved neutral, acidic and basic amino acids are marked 
in black, blue, and red, respectively. d, Western blot analysis shows that 
expression levels of CHR2 variants were comparable in transformed yeast 
colonies. Western blot assays were conducted with anti-HA and anti-Myc 
antibodies to detect CHR2 and SE, respectively. Asterisks indicate non- 
specific bands. e, Summary of Y2H assays. Left, schematic illustration 

of CHR2 variants used for Y2H. The numbers indicate the positions 

of the amino acid residues in the constructs. Right, + and - denote 
positive and negative results for the CHR2—SE interaction from Y2H 
assays in a and b. f, Histochemical staining analysis of GUS activity from 
PMIR159a-FM-GUS homozygotes show that CHR2(E1747A) does not 
alter the transcriptional level of MIR159a in Col-0 and chr2-1;PCHR2- 
gCHR2(E1747A)-FM backgrounds. Scale bar, 1 cm. 
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Extended Data Fig. 4 | CHR2 does not affect the expression of canonical 
components involved in miRNA biogenesis and metabolism, nor 

the SE- dependent pre-mRNA splicing pathway. a, Gene Ontology 
analysis of differentially expressed genes in chr2-1 plants. The dataset 

is mined from a previously published RNA-seq database** for the chr2 
mutant. b, Re-analysis of previously published microarray data sets for 
chr2 mutants**** to assess the expression of numerous canonical genes 
involved in the miRNA pathways. c, d, RT-PCR (c) and RT—qPCR (d) 
show that the expression levels of the genes involved in the miRNA 
pathway are generally comparable between Col-0 and the chr2-1 mutant. 
At1g22690 and At5g27845 serve as controls for down- and upregulated 
genes in chr2-1. Data significantly different from Col-0 controls are 
indicated (* P < 0.05, ** P < 0.01; unpaired, two-tailed Student's t-test). 
In b, d, the relative signals of genes were normalized to those of wild-type, 


[= Col-0 


se-1 (NAR, 2014) 


2 hr2-1 
(=e (+4) 


co brm-101 


co Ler 


\ A \ Nal Ry) ay Ww XY Ye ao \ Ve) 
” ov Og S$ 8 oO oa) & om Ss se 


fo chr2-1 F 


Xa) 
of be 
© 


NUae) 
LD 
we st 


se-2 Col-Ochr2-1 gDNA 
At1g13880 


At1g28520 


At3g17430 


Abnormal AS 


where the ratio was arbitrarily set to 1 with s.d. calculated from published 
data (b) or three biological repeats (d). Individual data points are shown. 
e, Western blot analysis shows that chr2-1 does not show enhanced 
accumulation of Microprocessor proteins in vivo. Western blot assays were 
conducted using antibodies against native DCL1 and HYLI proteins; an 
anti-Myc antibody was used to detect the native promoter-driven FM—SE 
protein. Actin serves as a control. f, g, Re-analysis of previously published 
RNA-seq data sets*>°* for chr2-1/brm-1 and se mutants shows that 
there is no significant overlap of abnormal intron retention events (f) and 
abnormal alternative splicing between se and chr2-1 mutants (g). h, RT- 
PCR analysis of selected marker genes shows that CHR2 is not involved in 
the SE-coordinated pre-mRNA splicing pathway. gDNA (genomic DNA) 
serves as a control. 
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Extended Data Fig. 5 | See next page for caption. 
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Extended Data Fig. 5 | In vitro Microprocessor assays show that CHR2 
inhibits pri-miRNA processing. a, SDS-PAGE of recombinant proteins 
purified from a Baculovirus/insect system or from E. coli. b, The in 
vitro-reconstituted Microprocessor processed pri-miRNA substrates as 
accurately and efficiently as DCL1 complex immunoprecipitated from 
plants in vivo. c, Pre-incubation of CHR2 with pri-miRNAs inhibited 

the miRNA processing in vitro. hSTING serves as a negative control. 

On the left, schematic illustration of in vitro reconstitution orders 

with recombinant CHR2, 5’ **P-labelled pri-miR166f transcript and 
Microprocessor components. RT, room temperature. On the right, 
autoradiography of in vitro Microprocessor assays (two replicates). Black 
arrows indicate processed and unprocessed fragments of pri-miR166f. 

d, CHR2 inhibition of Microprocessor activity was positively correlated 
with its amount applied. Molar ratio CHR2:DCL1 is shown at the top 

of the autoradiograph. The reconstitution assay was conducted as in 

Fig. 3a. e, CHR2 inhibition of pri-miRNA processing was positively 


correlated with the pre-incubation time of CHR2 with pri-miRNA. The 
pre-incubation time of CHR2 with pri-miR166f is shown at the top of 
the autoradiograph. The reconstitution assay was conducted as in Fig. 3a. 
f, Incubation of HYL1—SE with pri-miRNAs before addition of CHR2 
largely bypassed the inhibitory effect of CHR2 on pri-miRNA processing 
in vitro. Left, schematic representation of the in vitro reconstitution 
assay. Right, autoradiography of in vitro Microprocessor assays (two 
replicates). g, Incubation of CHR2 variant compromised in SE interaction 
(CHR2(E1747A)) with pri-miRNA substantially reduced pri-miRNA 
processing in vitro. Left, schematic representation of the in vitro 
reconstitution assay. Right, autoradiography of in vitro Microprocessor 
assays (two replicates). h, CHR2(E1747A) had only a marginal inhibitory 
effect on pri-miRNA processing when the substrate pri-miRNA was pre- 
incubated with excessive SE. Molar ratio of SE:CHR2 (or its variant) was 
4:1. Left, schematic representation of in vitro reconstitution assay. Right, 
autoradiography of in vitro Microprocessor assays (three replicates). 
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Extended Data Fig. 6 | See next page for caption. 
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Extended Data Fig. 6 | Three non-exclusive hypotheses for the 
mechanism by which CHR2 inhibits miRNA production; and CHR2 
and DCL1—HYL1 interact with different domains of SE. a, Schematic 
illustration of three non-exclusive hypotheses for the mechanism by 
which CHR2 inhibits miRNA processing. b, Summarized Y2H results 

for mapping of the DCL1—SE and CHR2-—SE interaction interfaces in 

SE protein (detailed results are shown in Extended Data Fig 7a, b). Left, 
schematic illustration of the truncation and deletion variants of SE and 
DCLI used for Y2H. The numbers indicate the positions of the amino acid 
residues in the constructs. Right, + and - denote positive and negative 
Y2H results for the interaction with SE, respectively. c, Proposed models 
of the protein interactions between DCL1—HYL1I and SE and between 
CHR2 and SE. DCL1 occupies the zinc finger domain (ZnF) of SE through 


its PAZ domain; dsRNA-binding domain 2 (RBD2) in HYL1 interacts 
with the N-terminal and mid domains (N+Mid) of the SE core!’. The 
DESRM motif in CHR2 contacts the GAPE domain (named for enriched 
Gly-Ala-Pro-Glu residues) in the unstructured C terminus of SE. Numbers 
show the amino acid residues comprising the indicated domains. The red, 
blue and cyan arrows indicate the interaction domains of the indicated 
proteins. d, RT-PCR shows that all genes are expressed well in the Y3H 
system. All transfected yeast cells were cultured in —Leu, —Trp and —Met 
synthetic defined broth. ScSED1 serves as a control. e, f, Y3H assays show 
that CHR2 does not interfere with the accessibility of DCLI—HYLI (e) or 
that of U1 snRNP component (Prp40b)°*® (f) to SE protein. 1:10 serial 
dilutions are shown. —LTM, lacking Leu, Trp and Met; -LTMH+5mM 
3-AT, lacking Leu, Trp, Met, His and adding 5 mM 3-AT. 
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Extended Data Fig. 7 | Mapping the regions of SE that interact with 
DCLI and CHR2 by Y2H; and the GAPE domain of SE that interacts 
with CHR? is conserved through plants. a, b, Detailed Y2H results of 
mapping the DCL1-SE interaction interface (a) and the interaction region 
of SE with CHR2 (b); 1:5 serial dilutions are shown. —LT, lacking Leu 
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and Trp; —LTHA, lacking Leu, Trp, His and Ade. c, Sequence alignment 
of GAPE domains of SE proteins across different species; conserved 
neutral, acidic and basic amino acids are marked by black, blue, and red, 
respectively. 
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Extended Data Fig. 8 | See next page for caption. 
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Extended Data Fig. 8 | CHR2 binds to nucleic acids in vitro and in vivo. 
a, EMSA shows CHR2 bound strongly to 5’ end **P-labelled pre-miRNA 
but weakly to ssRNA. Arrows indicate the mobility of protein—RNA 
complexes or labelled free RNA. b, c, EMSA shows the mobility pattern 

of the CHR2—pri-miRNA (b) and CHR2—pre-miRNA complexes (c). 
HYLI and hSTING are positive and negative controls, respectively. Cold 
probes, used for the binding competition, are unlabelled pri-miRNA and 
pre-miRNAs, accordingly. d, e, EMSA shows that CHR2 bound to dsDNA 
but not ssDNA. Arrows indicate the mobility of protein-DNA complexes 
or free DNA. f, The binding affinity (apparent Ky) of CHR2-dsDNA was 
calculated from EMSA image quantification with s.d. from three replicates. 
Individual data points are shown. g, EMSA shows the mobility pattern 

of SE-pri-miRNA complex. h, EMSA shows the mobility pattern of 
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HYL1-pri-miRNA complex. i, EMSA shows that HYL1 readily sequestered 
pri-miRNA from SE, CHR2 or CHR2—SE complexes. j, Western blot 
analysis of CHR2—FM protein immunoprecipitated from hyl1-2 chr2- 
1;PCHR2-gCHR2-FM plants using an anti-Myc antibody. Histone 3 serves 
as a control for input. k, RT—qPCR shows that none or faint signals were 
detected in the RNase-treated RIP samples. The result indicated that 

the nucleic acids recovered from the CHR2 immunoprecipitate shown 

in Fig. 4d were indeed RNAs. The relative signals of pri-miRNAs were 
normalized to the ones of the input initially and then to the control 
immunoprecipitate without RNase treatment, where the ratio was 
arbitrarily set to 1 with s.d. from three biological repeats. Individual data 
points are shown. ND, not detected. 
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Extended Data Fig. 9 | Effect of CHR2 mutations on remodelling/ 
helicase and ATPase activities. a, SDS-PAGE of purified CHR2 variants. 
b, Schematic diagram of preparation of the 3?P-labelled and nicked pri- 
miRNA. c, Remodelling/helicase activity of CHR2 and CHR2(D1355A/ 
R1385A) on nicked pri-miRNAs. Note: D1355A R1385A substitutions in 
the nucleotide binding domains compromised the remodelling activity of 
CHR2 on pri-miRNA. HYLI served as a negative control. d, Time-course 
of remodelling/helicase activity of CHR2 and its variants on **P-labelled 
pri-miRNA. e, Remodelling/helicase activity of CHR2 on *’P-labelled 
dsDNA. f, Time-course of remodelling/helicase activity of CHR2 and its 


variants on **P-labelled dsDNA. g, TLC assays showed that dsDNA and 
pri-miRNA stimulated ATPase activity of CHR2 and its ATPase mutant 
(G1009A/K1012A) in a dosage-dependent manner. h, TLC assays show the 
time-course of ATPase activity of CHR2 and its ATPase mutant (G1009A/ 
K1012A) in the absence (none) or presence of dsDNA (167 bp in length) 
and pri-miRNA (150 nt in length with ~50 base-pairing). i, Quantification 
of ATP hydrolysis rates of CHR2 and its ATPase mutant (G1009A/ 
K1012A) in the absence (none) or presence of dsDNA and pri-miRNA 
with s.d. from three replicates. Individual data points are shown. 
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Extended Data Fig. 10 | See next page for caption. 
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Extended Data Fig. 10 | Effect of CHR2 mutations on binding affinity 
to nucleic acids, Microprocessor cleavage efficiency of pri-miRNA, 
and rescue of morphological and miRNA accumulation defects in 
chr2-1 plants. a, Binding curves of CHR2 variants with dsDNA and 
pri-miRNA. Apparent Kg values (appK,) were calculated from EMSA 
image quantification with s.d. from three replicates. b, Microprocessor 
assays shows that the remodelling-compromised CHR2 mutant 
(G1009A/K1012A) failed to efficiently inhibit pri-miRNA processing. 
Left, schematic illustration of in vitro reconstitution orders with 
recombinant CHR? or its variant, 5’ >*P-labelled pri-miR166f transcript 
and Microprocessor components. RT, room temperature. Middle and 


right panels, autoradiography of the in vitro Microprocessor assays (two 
replicates). Black arrows indicate processed and unprocessed fragments 
of pri-miR166f. c, Diagram of point mutations in the CHR2 ATP and 
nucleotide binding domains. d, Remodelling-compromised CHR2 
mutants partially rescued the morphological defect of chr2-1 plants. Scale 
bar, 1 cm. e, Remodelling-compromised CHR2 mutants partially rescued 
chr2-1 miRNA accumulation defects. sRNA blot analysis (top) and western 
blot assays of CHR2 and variants (bottom). f, Western blot analysis of 
GUS protein in PMIR159b-FM-GUS homozygotes (n > 30) in different 
backgrounds shows that CHR2(G1009A/K1012A) decreased MIR1I59b 
transcription. 
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Extended Data Fig. 11 | See next page for caption. 
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Extended Data Fig. 11 | CHR2 remodels the folding of pri-miRNAs or 
pri-miRNA complexes in vivo in various ways. a, Schematic diagram 
of two DMS-based methods for probing the secondary structures 

of RNA in vivo. SSIV, reverse transcriptase SuperScript IV (Thermo 
Fisher). b, Schematic structure of pri-miRNA. c, A primer-extension 
assay of DMS—cDNA truncations of 18S rRNA validated the reliability 
of the approach of the DMS—primer extension in the experiment. 

d, An additional replicate of primer-extension assay for DMS—cDNA 
truncations shows that CHR2 altered the folding of the miRNA/* duplex 
and upper stem region of pri-miR164b. Red arrows, nucleotides that had 
stronger DMS activities and thus more mispairings in pri-miR164b from 
hyl1-2 chr2-1;PCHR2-gCHR2-FM plants than those from hyl1-2 chr2-1a. 
e, DMS—MaPseq analysis shows that DMS mutation profiling of UBQ4 
transcripts was essentially identical in Col-0 and chr2-1 plants. The DMS 
mutation frequencies (mismatch/total) of A and C residues are plotted 
along the nucleotide sequences probed. Background mutations in the 
samples that were not treated with DMS are shown (top). f, Background 


mutations in pri-miR166a in Col-0 and chr2-1 samples that were not 
treated with DMS are shown. DMS—MaPseq results of other tested pri- 
miRNAs including pri-miR159a, pri-miR159b, pri-miR160a, pri-miR166b, 
pri-miR168a, pri-miR168b, and pri-miR319b in the untreated samples 
were largely identical to that of pri-miR166a, and are not shown here. 

g, h, DMS-MaPseq analysis of additional pri-miRNAs illustrates that 
CHR2 can alter nucleotide pairings and/or protein protection in terminal 
loops and upper stems of pri-miR166a (g) and pri-miR168a (h). Top, 
DMS mutation frequencies of A and C residues (mismatch/total) from 
targeted pri-miRNAs averaged from two biological replicates are plotted 
along pri-miRNA sequence. Position 1 corresponds to the 5’ end of the 
modelled regions. Bottom, overall and zoomed secondary structures of 
the targeted pri-miRNAs are predicted from DMS activities. Colour-coded 
nucleotides display different DMS activity on pri-miRNAs between Col 
and chr2-1 plants. The miRNA and * strands are marked in purple and 
cyan, respectively. 
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Extended Data Fig. 12 | Additional evidence for the CHR2 function 

in remodelling the folding of pri-miRNAs or pri-miRNA complexes 

in vivo in various ways. a~d, DMS-MaPseq analysis of additional 
pri-miRNAs illustrates that CHR2 can alter nucleotide pairings and/or 
protein protection in the lower stem of pri-miR168b (a), the terminal loop 
and upper stem of pri-miR319b (b), and variable parts of pri-miR160a (c) 
and pri-miR166b (d). Top, DMS mutation frequencies of A and C residues 
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(mismatch/total) from targeted pri-miRNAs averaged from two biological 
replicates are plotted along pri-miRNA sequence. Position 1 corresponds 
to the 5’ end of the modelled regions. Bottom, the overall and zoomed 
secondary structures of the targeted pri-miRNAs are predicted from DMS 
activities. Colour-coded nucleotides display different DMS activity on 
pri-miRNAs between Col and chr2-1 plants. The miRNA and * strands are 
marked in purple and cyan, respectively. 
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Pulsar emission amplified and resolved by plasma 
lensing in an eclipsing binary 


Robert Main!?:3*, I-Sheng Yang**, Victor Chan!, Dongzi Li, Fang Xi Lin?°, Nikhil Mahajan!, Ue-Li Pen*®*4, 


Keith Vanderlinde!? & Marten H. van Kerkwijk! 


Radio pulsars scintillate because their emission travels through the 
ionized interstellar medium along multiple paths, which interfere 
with each other. It has long been realized that, independent of their 
nature, the regions responsible for the scintillation could be used 
as ‘interstellar lenses’ to localize pulsar emission regions!?. Most 
such lenses, however, resolve emission components only marginally, 
limiting results to statistical inferences and detections of small 
positional shifts*>. As lenses situated close to their source offer 
better resolution, it should be easier to resolve emission regions of 
pulsars located in high-density environments such as supernova 
remnants® or binaries in which the pulsar’s companion has an 
ionized outflow. Here we report observations of extreme plasma 
lensing in the ‘black widow’ pulsar, B1957+20, near the phase in its 
9.2-hour orbit at which its emission is eclipsed by its companion’s 
outflow’~°. During the lensing events, the observed radio flux is 
enhanced by factors of up to 70-80 at specific frequencies. The 
strongest events clearly resolve the emission regions: they affect the 
narrow main pulse and parts of the wider interpulse differently. We 
show that the events arise naturally from density fluctuations in 
the outer regions of the outflow, and we infer a resolution of our 
lenses that is comparable to the pulsar’s radius, about 10 kilometres. 
Furthermore, the distinct frequency structures imparted by the 
lensing are reminiscent of what is observed for the repeating fast 
radio burst FRB 121102, providing observational support for the 
idea that this source is observed through, and thus at times strongly 
magnified by, plasma lenses?°. 

On 2014 June 13-16, we took 9.5 h of data of PSR B1957+20 with 
the 305-m William E. Gordon Telescope at the Arecibo observatory, at 
observing frequency of 311.25-359.25 MHz (see Methods). These data 
were previously searched for giant pulses!!—sporadically occurring, 
extremely bright pulses, which are much shorter than regular pulses 
(around 1 js compared with tens of microseconds). Although we found 
many giant pulses at all orbital phases, we noticed that the incidence 
rate of bright pulses was much higher leading up to and following the 
radio eclipse. 

As can be seen in Fig. 1, most of the pulses near eclipse do not look 
like giant pulses but rather like brighter regular pulses—they are bright 
over a large fraction of the pulse profile, and most tellingly, occur in 
groups spanning several 1.6-ms pulse rotations, suggesting that the 
underlying events last for a time of the order of 10 ms. Their proper- 
ties seem similar to the hitherto mysterious bright pulses associated 
with the eclipse of PSR J1748 — 2446A”, suggesting a shared physical 
mechanism. 

High-magnification events are often chromatic, as can be seen from 
the colours in Fig. 1 (which reflect 16-MHz sub-bands) and is borne 
out more clearly by the spectra shown in Fig. 2: some show frequency 
widths comparable to our 48-MHz band, peaking at low or high fre- 
quency, whereas others show strong frequency evolution, in some cases 
tracing out a slope in frequency-time space, in others a double-peaked 
profile. 


For many high-magnification events, the magnification is not 
uniform across the pulse profile. The bottom panel of Fig. 1 shows this 
strikingly: at 0.2 s, the main pulse is greatly magnified over five pulses, 
whereas the entire interpulse is barely affected. In addition, the compo- 
nents of the broad interpulse are often magnified differently from each 
other, as can be seen most readily in the magnified events in Fig. 3b, c. 
Thus, the events resolve the pulsar’s various emission regions. 

The most strongly magnified events occur within three specific time 
spans, each lasting for about 5 min, around orbital phase ¢ = 0.20, 
@ = 0.30 and ¢ = 0.32 (see Fig. 1); here, @ = 0.25 corresponds to supe- 
rior conjunction of the pulsar in its 9.2-h orbit’, and the duration of 
the eclipses at 350 MHz is about 40 to 60 min (refs. 7°), or Ad & 0.1 
(a sketch of the system geometry is shown in Extended Data Fig. 1). In 
each time span, we observe only small overall frequency-dependent 
time delays, of the order of 10 1s, which indicate modest increases 
in electron column density, with dispersion measures of the order of 
10~* pc cm7? (see Fig. 1 as well as Extended Data Fig. 2). Intriguingly, 
at times when the delays are longer and the dispersion measures thus 
higher—immediately before and after eclipse, and in between the two 
post-eclipse periods of strong lensing, at ¢ ~ 0.31—the magnifica- 
tions are less marked, only up to a factor of a few, and correlated over 
much longer timescales, of the order of 100 ms. After the main lensing 
periods, up to ¢ = 0.36, flux variations remain correlated, suggesting 
that even weaker events are still present (examples of lensing in the 
aforementioned regions are shown in Extended Data Fig. 3). 

To determine whether the magnification events could be due to lens- 
ing by inhomogeneities in the companion’s outflow, we first measured the 
excess delay due to dispersion at a time resolution of 2 s, the shortest at 
which we can measure it reliably (see Methods). We find that during the 
periods of strong magnification events, the delay fluctuates by about 1 1s 
on this timescale. Assuming that the relative velocity between the pulsar 
and the companions outflow is roughly the orbital velocity, 360 km s~! 
(ref. 13), the 2-s timescale corresponds to a spatial scale Ax = 720 km. 
For this scale, the expected geometric delay is Ax?/2ac ~ 0.5 1s (where 
a = 6.4 light-seconds is the orbital separation and c the speed of light). As 
this is comparable to the observed dispersive delays, lensing is expected. 

To model the magnified pulses, we treat the signal using a standard 
wave-optics formalism. The electric field received by an observer is the 
sum of the electric field of the source across the lens plane, with the 
phase at every point determined by both geometric and dispersive 
delays. When a large area on the plane has (nearly) stationary phase, 
the electric field combines coherently, leading to a strongly magnified 
image, with the magnification 44 proportional to the area squared. 
Because dispersive and geometric delays scale differently with fre- 
quency v,a plasma lens cannot be in focus over all frequencies, but will 
impart a characteristic frequency width. This width will be smaller for 
larger magnification, with the precise scaling depending on the extent 
to which the lens is elliptic: Av/v~ 1/11 for a very elongated, effectively 
linear lens, and Av /v ~ 1/./j0 for a (roughly) circular one. In strong 
lensing, one generically expects multiple images to contribute, and thus 
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Fig. 1 | Strong lensing of the pulsar. a, Magnification distributions for the 
main pulse, in 10-s time bins, showing periods of lensing near ingress and 
egress of the eclipse of the pulsar by its companion’s outflow. The greyscale 
is logarithmic, to help individual bright pulses to stand out. b, Pulse 
profiles as a function of time, averaged over 10 s and in 128 phase bins. 
Cyan, yellow and magenta represent contiguous 16-MHz sub-bands, from 
low to high frequency. Near eclipse, plasma dispersion causes frequency- 
dependent delays. Gaps in the data correspond to calibrator scans. 


caustics to form, which can lead to a slope in time-frequency space, 
double-peaked spectra and other interference effects (see Methods). 
All of these are consistent with the behaviour seen in the measured 
spectra of the magnified pulses in Fig. 2. 

For any lens, different pulse components will be magnified differently 
if they arise from regions that have projected separations larger than 
the lens’s resolution. We can make a quantitative estimate of the lens 
resolution by using the magnifications of the events. Continuing to 
assume the lens is roughly co-located with the companion, that is, at 
the orbital separation a, the resolution of the lens for given magnifica- 
tion would be  1.9R,/."” for a linear lens, or © 1.9R,/ 1! for a circu- 
lar one, where R, = ,f Aa/t, or 23 km at our observing wavelength 
A=c/vx90cm (see Methods for a derivation). A peak magnification 
of, for instance, 4s = 50 then corresponds to a physical resolution of 
about 6 km for a linear lens or about 17 km for a circular one. 

The inferred resolution is comparable to the radius (approximately 
10 km) of the neutron star, and substantially smaller than the light- 
cylinder radius, Ric = cP/2x = 76 km, where the velocity of co- 
rotating magnetic fields approaches the speed of light, bounding 


c, d, Enlargements in which each time bin corresponds to an individual 
1.6-ms pulse. One sees extreme chromatic lensing in which pulses 

are magnified by an order of magnitude over tens of milliseconds; the 
brightest event is magnified by a factor of about 40 across our highest- 
frequency sub-band. In some events, the main pulse and interpulse are 
clearly affected differently, indicating that different emission regions 
corresponding to these are resolved by the lensing structures. 

e, f, Enlargements for a quiescent period for comparison. 


the magnetosphere in which pulsar emission is thought to origi- 
nate’*. Thus, the lensing offers the opportunity to map the emission 
geometry. 

Qualitatively, because the main pulse and interpulse, as well as parts 
of the wide interpulse beam, are sometimes magnified very differently, 
the inferred resolution is a lower limit both to the projected separation 
between the main pulse and the interpulse, and to the size of the inter- 
pulse. Perhaps not unexpectedly, the spatial separations do not seem to 
map directly to rotational phase: we see similar differences between the 
main pulse and the interpulse, which are separated by half a rotation, 
and between parts of the interpulse, which are separated by only about 
0.1 rotation. This may indicate that the interpulse consists of multiple 
components, which are not located close together in space. 

Combining the lens resolution with the timescale of the lensing 
events allows us to constrain the projected relative velocity between the 
lens and the pulsar. A priori, one might expect the outflow velocity to be 
slow, in which case the relative velocity would just be the orbital motion 
of about 360 km s~!. Given that strong magnification events typically 
last about 10 ms, this would then imply a resolution of approximately 
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Fig. 2 | A range of spectral behaviour in lensing events. Power spectra, 
binned to 3 MHz, of the main pulse in consecutive 1.6-ms rotations, for 
selected lensing events from eclipse egress. The top panels show the average 
magnification in each main pulse, and the side panels the magnification 
spectra of the brightest pulse (that is, that of pulse number 0). 


4 km, not dissimilar from what we find above, thus suggesting that 
the assumption of a slow outflow velocity is reasonable. By combining 
different constraints between the duration and frequency widths of 
lensing events, we can set a quantitative limit to the relative velocity 
(see Methods for a derivation), of 


> 360 kms! (1) 


R 1/2 
y>os——1_| 9? 


V 


Atuwum HWHM 


where for the numerical value we use the fact that the tightest con- 
straints come from the shortest, least chromatic events, which have 
frequency widths (Av /V) ywum © 0-b and durations Atiuwrum © lOms 
(measured as half-width at half-maximum, HWHM; see Fig. 2). This 
approximate limit equals the relative orbital velocity, which implies that 
the outflow velocity can be (but does not have to be) small in the rest 
frame of the companion. 

Our results offer many avenues of further research. Ideally, one 
would use the lensing to map the pulsar magnetosphere. This requires 
better constraints on the lenses, for example from observations over 
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Fig. 3 | Profiles of lensing events. a, Frequency-averaged pulse profile 
of a 9-min quiescent region, scaled up by a factor of 50 for comparison 
purposes; b, c, d, frequency-averaged pulse profiles surrounding bright 
events (using 128 phase bins). One sees that the events resolve the 
magnetosphere: the main pulse and interpulse are affected differently, as 
are parts within the interpulse. 


several eclipses at a range of frequencies. Further observations may 
also shed light on the eclipse mechanism. For example, if it is cyc- 
lo-synchrotron absorption, large magnetic fields, of about 20 G, are 
required’°, which would impart measurable polarization dependence 
of the lensing events. Furthermore, combining density and velocity into 
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a mass-loss rate of the outflow, one can infer the lifetime, and thus the 
final fate, of these systems. 

Finally, our observations establish that radio pulses can be strongly 
amplified by lensing in local ionized material. This adds support 
for the proposal that fast radio bursts (FRBs) can be lensed by 
host galaxy plasma, leading to variable magnification, narrow 
frequency structures and clustered arrival times of highly amplified 
(and thus observable) events!°. Evidence for high-density environ- 
ments includes that FRB 110523 is scattered within its host galaxy’®, 
and that FRB 121102 has been localized to a star-forming region'”"*, 
in an extreme and dynamic magneto-ionic environment”. Indeed, 
the latter’s repeating bursts have spectra””-’? remarkably similar to 
those shown in Fig. 2 (for example, compare with Fig. 2 of ref. 7°), 
and, like those bursts, the brightest pulses in PSR B1957+20 are highly 
clustered in time. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0133-z. 
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METHODS 


Calculating magnifications. The data were taken in four 2.4-h sessions on 2014 
June 13-16 and were recorded as part of a European VLBI network programme 
(GP 052). The data span 311.25-359.25 MHz, in three contiguous 16-MHz sub- 
bands, and we read, de-dispersed (with a dispersion measure of 29.1162 pc cm~*) 
and reduced the data as described in ref. !!. As one extra step, we accounted for 
the wander in orbital period of PSR B1957+20 by adjusting the time of ascending 
node in the ephemeris, such that it minimized the scatter of the arrival times of 
giant pulses associated with the main pulse across all four days of observation. 

We define the magnification of a pulse as the ratio of its flux to the mean flux in 
a quiescent region far from eclipse. Specifically, we construct an intensity profile of 
each pulse using 128 phase gates, subtract the mean off-pulse flux in each 16- MHz 
sub-band separately, measure the flux in an eight-gate (approximately 100 1s) 
window around the peak location (which we find from folded profiles, averaged in 
2-s bins to obtain sufficient signal-to-noise), and divide by the average flux in the 
same pulse window measured in a 9-min section far from eclipse. 

To construct the spectra of lensing events, we start by binning power spectra in 
3-MHz bins. We correct these approximately for the bandpass and the effects of 
interstellar scintillation (which still has a small amount of power on 3-MHz scales) 
by dividing them by the average spectra in the 15 s before and after (excluding the 
lensing event itself). This time span is chosen to be safely less than the timescale 
of about 84 s on which the interstellar scintillation pattern varies!!, ensuring the 
dynamic spectrum is stable on these scales. With 30 s of data, it is also very well 
measured: each 3-MHz channel has signal-to-noise ratio S/N = 150. 

Evidence that strong plasma lensing must occur. To obtain the properties of the 
lensed images, and in particular to determine whether we are in the ‘strong’ or 
‘weak lensing regime—that is, whether or not multiple images are formed—we use 
the basic principles of wave optics, considering path integrals of the electric field 
over a thin lens”. In our case, the phase of an electromagnetic wave going through 
different paths has contributions from both geometric and dispersive time delays, 


d(x, y) = Pou) at Spy y) (2) 


where x and y are in the lens plane. For a geometrically thin lens, that is, with 
thickness along the line of sight much smaller than the separation a, the geometric 
contribution to the phase can be written as 


L(x24y2 ere 
tou }{242*|(2) 2282 r 
2 a r 2Rey 
where Rp, is the Fresnel scale, 
Ry, = VAa & 40 km (4) 


with \ the wavelength of the radiation. For the numerical value, we used \ = 
c/v = 90 cm (where vis the observing frequency) and assumed that the lensing 
material was associated with the companion and thus at the orbital separation of 
a ® 6.4 light-seconds. (Note that as we are using wavelengths and frequencies of 
full cycles, our phases are in cycles too, not radians.) 

The dispersive contribution arises from the signal propagating through the lens’s 
extra dispersion measure (electron column density) ADM = of n.dz (with n, the 
electron number density), 


bo hy) = —“MADM( y) 5) 


where kp = e?/2nmec = 4,148.808 s pc”! cm? MHz? (ref. “). The minus sign arises 
because in a plasma the phase velocity is greater than the speed of light. 

Integrating over different paths, one effectively select regions where the electric 
fields are coherent; these lead to the final images. For instance, if the extra electrons 
are distributed uniformly (in the x-y plane after the z integral), then dpm is approx- 
imately constant and the total phase has a stationary point around x = y = 0, that 
is, around the line of sight. Furthermore, all paths that are less than the Fresnel 
scale away from this central path have similar phase, and thus one recovers the 
general result that the area of the lens that contributes scales with Re. 

For the lensing to be strong, a minimum requirement is that changes in the 
geometric and dispersive phase are of similar magnitude. Because dispersion also 
leads to overall time delays, differences in pulse arrival time give a measure of 
inhomogeneities in the lensing material. We measure pulse arrival times using 
the usual procedure of fitting pulse profiles to a high signal-to-noise template 
(from a quiescent period). We fit profiles in the three bands separately and con- 
vert the weighted average to ADM using standard equations. We find that the 
inferred ADM shows both the expected large-scale variations’ and considerable 
variability down to the shortest timescales (2 s) at which we can measure it reliably 
(see Extended Data Fig. 2). During the periods in which the strongly magni- 


fied events occur, we find intrinsic variability corresponding to variations in 
delay of Atpw © 1 ts, which suggests that the inhomogeneities correspond 
to differences in the dispersion phase of A@pm = vAtpm * 300 cycles at 2-s 
timescales. 

To compare this to the geometric phase, we need to translate the timescale 
of 2 s to a length scale. Assuming the relative motion between the pulsar and 
the lensing material is dominated by the orbital motion, of 360 km s~1, the spatial 
scale is 720 km. Using equation (3), this corresponds to Adgm (720 km) = 200 
cycles. 

The fact that Adpm is comparable to and slightly larger than A@gm at the dis- 
tance scale of 720 km guarantees that somewhere around this scale, slopes in épm 
and gm will sometimes cancel. It also implies that multiple stationary points with 
V¢= Dare likely. Because the phases result from a continuous function of x, y and 
v, the stationary points must emerge or annihilate in pairs, leading to so-called 
caustics, where one has not just V¢ = 0 but also det(0;0;h) = 0 (ref. 25) 

Given the above, strong lensing is a natural cause of the strongly magnified 

events that we observe. 
A perfect lens model. To calculate properties of the lensed images, we would 
need to perform the path integral. This is not possible, as it requires the value of 
¢pn Of the full two-dimensional lens plane at a resolution better than the Fresnel 
scale, whereas our time-delay measurements are limited to the one-dimensional 
trajectory of the pulsar, and to an order of magnitude larger scales. 

Hence, for our analysis we use a simplified model, amenable to wave-optics 
analysis, in which each strong lensing event is associated with an elliptical perfect 
lens, that is, there is a single focal point to which all paths within the lens contrib- 
ute perfectly coherently, and all paths outside the lens do not contribute at all. 
As shown in Extended Data Fig. 4, our model is parametrized by the location of 
the focal point relative to the centre of the lens, (X, Y), the semi-major and semi- 
minor axes of the lens, (Axtens; Ayiens) and the orientation relative to the projected 
pulsar trajectory. 

In the special case of a centred circular lens with radius R, that is, Axjens = Ayiens 
= Rand X= Y=0, the set-up would produce an Airy disk. By comparing this to 
the actual path-integral of an unlensed image (that is, with @pm = 0), we can 
identify the unlensed case with a circular lens with radius R, = Rp, / J7. 

Integrating the electric field over an elliptical lens, the magnification (defined 
as the increase in intensity js = I/{I), where I = E”), will be proportional to the 
area squared: 


2 


Ay, lens 
R 


2 
Axtens | ( 6) 


Ri 


Before continuing, we note that at first glance it might seem puzzling for the 
magnification to be proportional to image area squared, as that seems to violate 
energy (flux) conservation. However, the image is also more highly beamed; what 
we calculated is the magnification in the centre of the beam. Viewed from an angle, 
a linear phase term is induced across the originally coherent region. Thus, when 
the region is larger, it is easier to become incoherent. The solid angle scales as 
AQ« ese ae and thus total energy (flux) is indeed proportional to the area 
of the lens. 

Differential magnification and chromaticity. We now proceed to derive the phys- 
ical resolution of the lens. The above magnification is reached only when the source 
is exactly at the focal point. When the emission region is some distance away from 
the focal point, paths from it to different parts of elliptical region will have extra 
phase differences. For instance, when a source is separated from the focal point in 
the x direction by Xsource = (X — Xs) (where x, is measured relative to the centre of 
the lens), there is a phase difference across the lens of 


Ag= fas (7) 

1 
When this phase difference reaches order 1, the image from the new source loca- 
tion will no longer be magnified. Defining the resolution as the offset for which 
total cancellation happens, we find from an explicit integral over the elliptical lens! 


1.9R? 
Xres i (8) 
AXxiens 
1.9R; 
et (9) 


7 = Ay lens 


Hence, total cancellation occurs on an elliptical beam with semi-minor and 
semi-major axes X;es aNd Yres that scale inversely to the semi-major and semi- 
minor axes of the lens. 
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We can write the above in terms of the magnification for a specific lens model. 
We will consider two extremes, the first a very anisotropic lens, effectively one- 
dimensional, for which Ayjens = Ri. For this case, 


_ LOR, 
Xtes wr 


(10) 


In the opposite direction, we consider a circular lens, with Axjens = AYtens. For 
this case, 


(11) 


Here, dividing by the distance a and writing in terms of diameter D = 2A jens, one 
recovers the usual relation for the angular resolution of a circular lens, 1.22./D. 

A similar result can be derived in frequency space. Given the different frequency 
dependencies of the geometric and dispersive phases, a lens cannot be fully in focus 
across all frequencies, but will have some characteristic frequency width. To show 
this, we first assume that the focal point is co-located with the centre of the elliptical 
lens, that is, (X, Y) = (0,0), and that perfect coherence occurs at some frequency 
V,, that is, that at v, one has 


u(x? +y 4) 12) 
Poy Yo Ue) = —~Opyygl% YM) = ( 
The total phase at a nearby frequency is then given by 
2,2 
bGwadja(ePeY |Be exo (13) 
V. v.+ Av 2ac 


the dominant term of which is linear in Av. If this term is of order 1 within the 
elliptical region, the image will no longer be magnified. Perfect cancellation does 
not happen in this case, but one can define characteristic width. Using half-width 
at half-maximum, one finds 


[s | _ 2.9R; 
4 2 2 4 
% HWHM [3Arb—2AriensAY 2, ote 3AY ens 


(14) 


In terms of the magnification, we find for the special case of an effectively 
one-dimensional lens 


Av 1.7 
— oan (15) 
% J wom i 
whereas for a circular lens 
Av 1.5 
. ant} (16) 
¢/ywum 


When the focal point is not in the centre of the elliptical region, there will be 
linear terms proportional to XAv and YAv that also contribute to the phase. They 
make the dependence on Av even steeper, which means that equation (14) is an 
upper bound on the frequency width of magnified images. 

It is possible for the frequency and position shifts to cancel each other, so that 
after moving the source by some distance, it is still strongly magnified at a nearby 
frequency. That leads to a slope of strongest magnification in time-frequency space. 
Following a derivation similar to that outlined above for the behaviour with frequency 
and position separately, we find that the slope is related only to the offset of the source 
from the focal point. For instance, for the case that the source is travelling along the 
semi-major axis of the lens, that is, along a trajectory (xs, ys) = (x, (#); 0), we find 


eo: (17) 
dx, 2x, 
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Finally, it is possible that the pulsar is close to the focus of more than one lens 
at the same time. As the pulsar moves, these multiple focal points lead to multiple 
images which change intensity and interfere with each other. This can lead to a wide 
variety of spectral behaviour, and might well be responsible for the multi-peaked 
structures seen in some panels of Fig. 2. 

A lower bound on the relative velocity. Within the perfect lens approximation, 
one can derive a lower bound on the relative, transverse velocity between the lens 
and the pulsar from the durations and frequency widths of the lensing events. 

The duration of strongly magnified events is predicted to be 


0.7R; 0.7R; 
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where the inequality corresponds to making the most conservative estimate that 
the lens is effectively one-dimensional, extended in the x direction, and that the 
full velocity v is directed along it. 

In the frequency space, the same event will have its width bounded from above 
by equation (14): 


(18) 


A 1.8R, 
[a 2 (19) 
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Combining the two limits to eliminate Axjens/Rj, we find: 
1/2 
R 
vos n_(™ (20) 
tywum \ YJ) pwr 


Lensing consistency checks. Our analysis in terms of lensing was motivated by 
the facts that the bright pulses before and after eclipse do not look like giant pulses, 
that they occur at specific orbital phases where dispersion measure fluctuations 
are large, and that, in most events, the entire profile increases in flux (although 
not always by the same factor across the profile), with the enhancements lasting 
for several pulses. 

Consistent with the expectations of plasma lensing, in which lenses focus light 
but arrive with associated ‘shadows’ where light is scattered away from our direct 
line of sight, we find that in regions with many strongly lensed pulses, the average 
flux received is unchanged (see Extended Data Fig. 3). Also consistent with plasma 
lensing is that the strongly magnified pulses are highly chromatic, often peaking at 
high or low frequencies, and sometimes showing slopes in frequency time space 
or interference patterns characteristic of caustics. 

Finally, as a concrete example, the peak magnifications of ju + 70 are comparable 
to the magnifications expected for plasma lensing. The measured dispersive time 
delay changes provide direct evidence that the geometric and dispersive phases 
oem and @pm can cancel over distances of about 720 km. The cancellation is likely 
to happen mostly in one spatial direction; setting Axjens = 720 km and Ayjens = Ri, 
our perfect lens model predicts a magnification of the order of 100, comparable 
to the observed value. 

Data availability. The data underlying the figures are available in text files at 
https://github.com/ramain/B1957LensingData 

Code availability. The raw data were read using the baseband package: https:// 
github.com/mhvk/baseband 
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Extended Data Fig. 1 | Shock geometry. The brown dwarf companion is and separations are drawn roughly to scale, whereas the pulsar is not: the 
irradiated by the pulsar wind, causing it to be hotter on the side facing the light cylinder radius of 76 km would be indistinguishable on this figure. 
pulsar”’, and inflated, nearly filling its Roche lobe”. Outflowing material The inclination of the system is conservatively constrained to 50° <i< 85° 
is shocked by the pulsar wind, leaving a cometary-like tail of material. This _(ref. '). vp and v, are the velocities of the pulsar and companion, a is the 
tail is asymmetric because of the companion’s orbital motion, which leads semi-major axis of the orbit, and Rroche is the Roche lobe of the companion. 


to eclipse egress lasting substantially longer than ingress. The companion 
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Extended Data Fig. 2 | Dispersion measure near the radio eclipse. at our observing frequency of 330 MHz, ADM = 0.001 pe cm~? 
Shown is the excess dispersion measure relative to the interstellar corresponds to a delay of 38 ts. The scatter around the curves is intrinsic. 
dispersion, with insets focusing on regions of strong lensing. The excess During periods of strong lensing, it is at a level of 2.6 x 10° pc cm “3, 
dispersion is estimated from delays in pulse arrival times (see Methods); corresponding to variations in delay time of 1 1s. 
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Extended Data Fig. 3 | Pulse profiles and magnification distributions. magnification; although hard to see, the average flux is reduced to about 
a-e, Each row shows pulse profiles for 10-s segments (left, in 128 phase 70% by absorption in eclipsing material); c, the first post-eclipse lensing 
bins), and the corresponding magnification distributions. The segments period, showing extreme magnifications but little change in average flux; 
are taken from: a, a quiescent region, showing a log-normal magnification _ d, in between the two post-eclipse strong lensing periods, showing only 
distribution (repeated using a dotted line in b-e for comparison); b, weak lensing on relatively long, approximately 100-ms timescales; e, the 
eclipse ingress, showing some lensing events, reflected in the tail to high second post-eclipse period of strong lensing. 
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Extended Data Fig. 4 | Geometry of a lensing region. An almost edge-on _ coherently, leading to strong magnification. In general, the focal point may 
(left) and a face-on (right) view of the lensing geometry. We assume that not be at the centre of the ellipse (although it is likely to be within it), and 
the source is at a separation from the pulsar equal to the semi-major axis the source trajectory [x,(t), ys(t)] may not intersect the focal point or the 
a of the binary system and moves on a trajectory parallel to the lens plane. _ elliptical region. 


A source at the focal point (X, Y) will illuminate the entire elliptical lens 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


| oie iol en 


https://doi.org/10.1038/s41586-018-0101-7 


An absolute sodium abundance for a cloud-free 


‘hot Saturn’ exoplanet 


N. Nikolov'*, D. K. Sing!, J. J. Fortney’, J. M. Goyal!, B. Drummond, T. M. Evans!, N. P. Gibson’, E. J. W. De Mooij®®, 
Z. Rustamkulov’, H. R. Wakeford’, B. Smalley’, A. J. Burgasser®, C. Hellier®, Ch. Helling!", N. J. Mayne!, N. Madhusudhan”, 
T. Kataria'’, J. Baines‘, A. L. Carter!, G. E. Ballester™, J. K. Barstow", J. McCleery* & J. J. Spake! 


Broad absorption signatures from alkali metals, such as the 
sodium (Na 1) and potassium (K 1) resonance doublets, have long 
been predicted in the optical atmospheric spectra of cloud-free 
irradiated gas giant exoplanets!~>. However, observations have 
revealed only the narrow cores of these features rather than the full 
pressure-broadened profiles*°. Cloud and haze opacity at the day- 
night planetary terminator are considered to be responsible for 
obscuring the absorption-line wings, which hinders constraints 
on absolute atmospheric abundances’ ®. Here we report an optical 
transmission spectrum for the ‘hot Saturn’ exoplanet WASP-96b 
obtained with the Very Large Telescope, which exhibits the 
complete pressure-broadened profile of the sodium absorption 
feature. The spectrum is in excellent agreement with cloud-free, 
solar-abundance models assuming chemical equilibrium. We are 
able to measure a precise, absolute sodium abundance of 
logena=6.970S, and use it as a proxy for the planet’s atmospheric 
metallicity relative to the solar value (Zp/Zo =2.3"??). This result 
is consistent with the mass-metallicity trend observed for Solar 
System planets and exoplanets!™!?. 

We observed two transits of the ‘hot Saturn’ planet WASP-96b 
(planetary mass M, = (0.48 + 0.03)My, where My is the mass of Jupiter, 
planetary radius R, = (1.20 + 0.06)R), where Ry is the radius of Jupiter, 
and equilibrium temperature T.g= 1,285 + 40 K)® on 2017 July 29 
and August 22 uT in photometric conditions, using the 8.2-m Unit 
Telescope 1 of the Very Large Telescope, with the FORS2 spectrograph. 
Data were collected in the multi-object-spectroscopy mode using 
grisms 600B (blue) and 600RI (red) on the first and second nights, 
respectively, which, when combined, cover the wavelength range 
3,600-8,200 A. We used a mask consisting of two broad slits centred 
on the target and on a reference star of similar brightness. Broad slits 
spanning 22” along the dispersion and 120” along the spatial (perpen- 
dicular) axis were used to minimize slit losses due to seeing variations 
and guiding imperfections. 

For each transit, we produced wavelength-integrated ‘white’ and 
spectroscopic light curves for WASP-96 and the reference star by 
integrating the flux of each spectrum along the dispersion axis. We cor- 
rected the light curves for extinction caused by the Earth's atmosphere 
by dividing the flux of the target by the flux of the reference star. We 
modelled the transit and systematic effects of the white-light curves by 
treating the data as a Gaussian process and assuming quadratic limb 
darkening for the star. The transit parameters—mid-time Tmia, orbital 
inclination i, normalized semi-major axis a/R», the planet-to-star radius 
ratio R,/R« and the two limb-darkening coefficients u; and u.—were 
allowed to vary in the fit to each of the two white-light curves, while 
the orbital period was held fixed to the previously determined value. 


The white-light curves and results from the modelling are shown in 
Extended Data Fig. 1 and Extended Data Table 1. 
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Fig. 1 | Transmission spectrum of WASP-96b compared to models. 

a, Comparison of the FORS2 observations (black dots with 1o vertical 
error bars; the horizontal bars indicate spectral bin widths) with clear>!°, 
cloudy and hazy one-dimensional forward atmospheric models at solar 
abundance! (continuous lines). The two best-fit models assume a clear 
atmosphere with different line broadening shapes for Na and K (see text 
for details). Models with hazes or clouds (magenta and blue) predict much 
smaller and narrower absorption features. b, Similar to a, but showing the 
best-fit model obtained from the retrieval analysis” (red line) binned to 
the data resolution (red dots), with the 1a, 2c and 30 confidence intervals 
(dark blue to pale blue regions). 
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Fig. 2 | Retrieved atmospheric properties for WASP-96b. Histograms of 
the marginalized posterior distributions from a free retrieval. 

a, b, Negligible opacity from clouds and hazes comprises the evidence for a 
clear atmosphere at the limb of the planet. c-e, Retrieved elemental 
abundances in the scale of ref. 7°, which ranges from 0 to 12 with the 
abundance of hydrogen logey = 12. The abundance of Na, logena=6.9"9 45 


To obtain the transmission spectrum, we produced 28 and 35 spec- 
troscopic light curves, from the blue and red grisms, respectively, with 
a width of 160 A. Wavelength-independent systematics were corrected 
following standard practice, as detailed in the Methods. We allowed 
only R,/R«and 1, to vary. The rest of the system parameters were fixed 
to their weighted mean values from the analysis of the two white-light 
curves, while the quadratic limb-darkening coefficients uz were fixed 
to their theoretical values. To account for systematics, we marginalized 
over a grid of polynomials, where the latter consisted of terms up to 
second order in air mass and drift of the spectrum across the detector 
along both the dispersion and cross-dispersion axes. The resulting time 
series are shown in Extended Data Figs. 2 and 3. 

The measured wavelength-dependent relative planet radii are shown 
in Fig. 1, which comprises the transmission spectrum of WASP-96b. The 
spectrum reveals the absorption signature of the pressure-broadened 
sodium D line with wings covering about 6 atmospheric pressure 
scale heights (one scale height corresponds to about 610km, assuming 
Teq = 1,285 Kk), in a wavelength range of around 5,000-7,500 A, and 
a slope at near-ultraviolet wavelengths due to Rayleigh scattering by 
molecular hydrogen. The radius measurements around the potassium 
feature show no obvious broadened line wing shape or larger absorp- 
tion at the line cores. 

To interpret the measured transmission spectrum, we first compare 
it with clear, cloudy and hazy atmospheric models with solar abun- 
dances from ref. '*. We find that cloud-free models assuming chemical 
equilibrium best fitted the 49 data points, giving 7 =49 and y*=50 
for a total of 48 degrees of freedom. Models with clouds and hazes, 
that is, 100 x enhanced-Rayleigh scattering cross-section (haze) and 
100 x enhanced wavelength-independent (cloud) opacity, give y’ values 
of 69 and 76 respectively, and are disfavoured at about 30 and about 


loge, 


loge ; 


is the only constrained quantity (c). The vertical continuous and dotted 
lines indicate the mean abundances and 1o uncertainties, respectively. 
Shown are the elemental abundances with the uncertainties of the host star 
(dotted lines in blue regions) and the Sun (dash-dotted lines in grey 
regions”’). 


50 confidence, respectively (Fig. 1). Further details are provided in 
Methods. 

The wing shape of atomic absorption lines is a result of the com- 
bined contribution of the quantum mechanical (natural), thermal 
(or Doppler) and collisional (or pressure) broadening mechanisms!. 
Measurements of the shape of pressure-broadened line wings can 
provide important constraints on the interaction potentials used in 
the theory of stellar and sub-stellar atmospheres!®*!”. Although such 
constraints have been obtained from Na and K absorption lines in 
the spectra of brown dwarfs!*, the actual shape of the profiles for 
exoplanets remains unconstrained. To assess the detection of sodium 
line-broadening we compared the spectrum to models with no broad- 
ened lines. Compared to the best-fit clear-atmosphere model, the 
narrow-line model is found to be rejected at the 5.80 confidence level. 
This is in contrast to WASP-39b, WASP-17b and HD209458b, which 
have previously been classified as having the clearest atmospheres of the 
known exoplanets. The latter transmission spectra are well explained 
with narrow alkali features, implying that the broad absorption wings 
are masked by clouds and hazes*>?®?", 

The broad sodium feature measured for WASP-96b therefore 
provides a unique opportunity to constrain the pressure-broadened 
line shape for an exoplanet atmosphere. We compared the observed 
spectrum to two cloud-free models, assuming alkali line-wing shapes 
from refs *!°. We find each of them to be statistically consistent with the 
data, although the wing profile of ref. * is marginally preferred (Fig. 1, 
red and orange models). 

To further interpret the physical properties of WASP-96b’s atmos- 
phere, we performed a retrieval analysis of the data using the one- 
dimensional radiative-convective ATMO model”. We assumed an 
isothermal atmosphere and allowed the temperature, radius, opacity 
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Fig. 3 | Mass-metallicity diagram for Solar System planets and 
exoplanets. Methane (CH,) and water (HO) are the two absorbing 
constituents used to constrain the atmospheric metallicity of Solar System 
planets (blue bars) and hot gas giant exoplanets (orange squares with grey 
error bars), respectively. Absorption lines from atomic Na (red triangles 
and error bars) can provide another proxy for exoplanet atmospheric 
metallicity, by combining three Hubble Space Telescope (HST) and two 
Very Large Telescope (VLT) transits for WASP-39b. With its detected 

and resolved pressure-broadened Na line wings, WASP-96b is the first 
transiting exoplanet for which high-precision atmospheric metallicity 

has been constrained using data only from the ground. Each error bar 
corresponds to the 1a uncertainty. The blue line indicates a fit to the Solar 
System gas giants (pale blue symbols indicate Solar System planets). 


from clouds and hazes and the elemental abundances of Na and K to 
vary. In addition, Li is expected to add opacity at about 6,650 A, which 
is covered by three of our measurements. Throughout this Letter, we 
adopt the astronomical scale of logarithmic abundances of ref. 23 where 
hydrogen (H) is defined to be loge} = 12. The abundance ofa particular 
element X is defined as logex =log(Nx/Ny) + 12, where Nx and Ny are 
the number densities of elements X and H. Our retrieval analysis finds 
negligible contributions from cloud or haze opacity, which indicates 
that the atmosphere of WASP-96b is free of clouds and hazes at the 
pressures being probed at the limb. The best-fit transmission spectrum 
includes opacity from Na, Li, K and Rayleigh scattering (Fig. 1). We 
obtain a tight constraint of logena = 6.9736 on the sodium abundance, 
which is in agreement with the solar abundance as well as with the 
measured sodium abundance in the WASP-96 host star 
(Fig. 2). The best-fit model gives x” = 39 for 42 degrees of freedom. 

The current data do not support detections of K or Li, as the mini- 
mum y’ value when excluding the two species is only slightly higher 
than when they are included (A \? = 2). However, we include the two 
species in our retrieval model to marginalize the Na abundance over 
the possibility of their presence and estimate upper limits on their 
abundances. The abundances of K and Li are also found to depend on 
the assumed profile shape of the Na feature. We find an atmospheric 
temperature of T=1,710'30) K, which is somewhat higher, compared 
with the planet's equilibrium temperature of T.,= 1,285 + 40 K under 
the assumption of zero albedo and uniform day-night heat 
redistribution’? 

Heavy-element abundance measurements are important to constrain 
formation mechanisms of gas-giant exoplanets. According to the 
core-accretion paradigm, as the planet mass decreases, the atmospheric 
metallicity increases”**. Giant planets accrete H/He-dominated gas as 
they form, so they also accrete planetesimals” that enrich their H/He 
envelopes in metals. A low-mass H/He envelope has a smaller amount 
of gas for these metals to be mixed into, leading to a higher metal 
enrichment compared to the parent star. This is also the scenario for 
Solar System gas giants, where metallicity has been constrained from 
methane (CH,) abundance from in situ or infrared spectroscopy””~*°, 
showing increasing enrichment of heavy elements with decreasing mass 
(Fig. 3). Measurements of H2O abundances have been used to constrain 
atmospheric metallicities for a small sample of exoplanets!°-!*. The 
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measured molecular abundances are used as proxies to atmospheric 
metallicities, assuming chemical equilibrium conditions. Using our 
measurement of the absolute sodium abundance of WASP-96b, we 
estimate an atmospheric metallicity of ZplZo =33°¢, that is, 
log(Zp/Z) =0.479.2. This is consistent with the heavy-element abun- 
dance of the host star Z#/Zo = 1.40.7, which we estimate using the 
relation Z«/Z., = 10'*/# where [Fe/H] =0.14+0.19. While our WASP- 
96b measurement is consistent with the Solar System mass-metallicity 
trend (see Fig. 3), we note that additional high-precision constraints 
would be necessary to further support or refute a trend for 
exoplanets. 

WASP-96b is the first exoplanet for which the pressure-broadened 
wings of an atomic absorption line (Na 1) have been observed, probing 
deeper layers of the atmosphere at the limb. This observation has also 
enabled a precise atmospheric abundance constraint, using ground- 
based data alone. Our result demonstrates that combined with near- 
ultraviolet data, the Na absorption feature at approximately 5,890 A is 
a valuable probe of exoplanet metallicities accessible to ground-based 
telescopes over a wavelength region largely free of contamination 
by telluric lines. WASP-96b is the first gas giant of approximately 20 
exoplanets so far characterized in transmission, to our knowledge, to 
have a broad atomic absorption feature detected. This demonstrates the 
important role a future ground-based optical spectrograph, optimized 
for transmission spectroscopy, could play. With the clearest atmosphere 
of any exoplanet characterized so far, WASP-96b will be an important 
target for the upcoming James Webb Space Telescope. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0101-7. 
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METHODS 

Observations. We observed two transits of WASP-96b with the FOcal Reducer 
and Spectrograph (FORS2)* attached on the Unit Telescope 1 (Antu) of the VLT 
at the European Southern Observatory on Cerro Paranal in Chile as part of Large 
Program 199.C-0467 (Principal Investigator N.N.). We used an observing setup 
and strategy similar to our VLT FORS2 Comparative Transmission Spectroscopy 
of WASP-39b and WASP-31b”*?, 

During the two transits, we monitored the flux of WASP-96 and one reference 
star at photometric conditions. The reference star, known as 2MASS 00041885- 
4716309, is the only bright source in the FORS2 field of view and is located at 
an angular separation of 5.3’ away from the target. Fortunately, the reference is 
of similar colour and brightness, which reduced the effect of differential colour 
extinction. For example, the magnitude differences (target minus reference) from 
the PPMXL* catalogue are AB= —0.46, AR= —0.49 and AI= —0.5. We observed 
both transits with the same slit mask and the red detector (Massachusetts Institute 
of Technology), which is a mosaic of two chips. We positioned the instrument field 
of view such that each detector imaged one source. The field of view was monitored 
without guiding interruptions during the full observing campaigns. To improve 
the duty cycle, we made use of the fastest available read-out mode (200 kHz, about 
30s). On both nights, we ensured that the Longitudinal Atmospheric Dispersion 
Corrector was in its neutral position, that is, inactive. 

During the first night, we used the dispersive element GRIS600B (hereafter 
blue and 600B), which covers the spectral wavelength range A= 3,600-6,200 A at 
a resolving power of R ~ \/A\=600. The field of view rose from an air mass of 
1.43 to 1.08 and was set at an airmass of 1.16. The seeing oscillated around 0.5” 
during the first 3.5 h and gradually increased to 1.2’ at the end of the observation. 
We collected a total of 89 exposures for about 5h with integration times adjusted 
between 120s and 230s. 

During the second night, we exploited the dispersive element GRIS600RI (here- 

after red and 600RI), which covers the range 5,400-8,200 A, in combination with 
the GG435 filter, to isolate the first order. The field of view rose from an air mass of 
1.23 to 1.08 and was set at an airmass of 1.36. The seeing varied between 0.3” and 
0.5’’ as measured from the cross-dispersion profiles of the spectra. We monitored 
WASP-96 and the reference star for about 5h 20 min and collected a total of 233 
spectra with integration times between 30s and 80s. 
Calibrations and data reduction. We performed data reduction and analysis using 
a customized Interactive Data Language (IDL) pipeline””. We started by subtracting 
a bias frame and by applying a flat field correction to the raw images. We computed 
a master bias and flat field by obtaining the median of 100 individual frames. 
Cosmic rays were identified and corrected following the routine detailed in ref. *4. 
We extracted one-dimensional spectra using the Image Reduction and Analysis 
Facility (IRAF)’s APALL task. To trace the stars, we used a fit of a Chebyshev poly- 
nomial of two parameters. We performed background correction by subtracting 
the median background from the stellar spectrum for each wavelength, computed 
from a box located away from the spectral trace. We found that aperture radii of 
14 and 12 pixels and sky regions 21 to 72 pixels (where the zero point is the middle 
of the spectrum special profile) and 23 to 74 pixels minimize the dispersion of the 
out-of-transit flux of the band-integrated white light curves for the blue and red 
observations, respectively. 

We performed a wavelength calibration of the extracted stellar spectra using 
spectra of an emission lamp, obtained after each of the two transit observations 
with a mask identical to the science mask, but with slit widths of 1/’. We estab- 
lished a wavelength solution for each of the two stars with a low-order Chebyshev 
polynomial fit to the centres of a dozen lines, which we identified by performing a 
Gaussian fit. To account for displacements during the course of each observation 
and relative to the reference star, we placed the extracted spectra on a common 
Doppler-corrected rest frame through cross-correlation. All spectra were found to 
drift in the dispersion direction to no more than 2.5 pixels, with instrument gravity 
flexure being the most likely reason. 

Example spectra of WASP-96 and the reference star are shown in Extended 
Data Fig. 1. We achieved typical signal-to-noise ratios of 315 and 280 per pixel 
for the central wavelength of the blue grism and 313 and 257 for the red grism, 
respectively. We then used the extracted spectra to produce band-integrated white 
and spectroscopic light curves for each source and transit by summing up the flux 
along the dispersion axis in each bandpass. 

White-light curve analysis. We produced white-light curves from 4,013 A to 
6,173 A and from 5,293 A to 8,333 A for the blue and red observations, respec- 
tively. We corrected the raw flux of the target by dividing by the raw flux of the 
reference star. This correction removes the contribution of Earth’s atmospheric 
transparency variations, as demonstrated in Extended Data Fig. 1. We modelled 
the white light transits and instrumental systematics simultaneously by treating 
the data as a Gaussian process*>-3”. We performed the Gaussian process analysis 
using the Python Gaussian process library George**“'. Under the Gaussian process 
assumption, the data likelihood is a multivariate normal distribution with a mean 


function js describing the deterministic transit signal and a covariance matrix K 
that accounts for stochastic correlations (that is, poorly constrained systematics) 
in the data: 


pf | 9.) =MuK) 


where p is the probability density function, fis a vector containing the flux meas- 
urements, 0 is a vector containing the mean function parameters, 7 is a function 
containing the covariance parameters and NV is a multivariate normal distribution. 
We defined the mean function ji as follows: 


[1(t, £5 Cos Cy» A) = [cg + cyt] T(E; A) 


where fis a vector of all central exposure time stamps in Julian Date, # is a vector 
containing all standardized times, that is, with subtracted mean exposure time and 
divided by the standard deviation, cp and c; describe a linear baseline trend, T(@) is an 
analytical expression describing the transit and 0 = (i,a/R,, Trias Ry /Rys Uys Ud)» 
where i is the orbital inclination, a/R+ is the normalized semi-major axis, Tiniq is the 
central transit time, R,/R« is the planet-to-star radius ratio, and u and uz are the 
linear and quadratic limb darkening coefficients. To obtain an analytical transit 
model T(6), we used the formulae found in ref. “?. We fixed the orbital period to its 
value from ref. '° and fitted for the remaining system parameters. 

We accounted for the stellar limb-darkening by adopting the two-parameter 
(u;, U2) quadratic law and computed the values of the coefficients using a three- 
dimensional stellar atmosphere model grid“. In these calculations, we adopted the 
closest match to the effective temperature, surface gravity and metallicity of the 
exoplanet host star found in ref. }3. The choice of a quadratic versus a more complex 
law (such as a four-parameter nonlinear law“*) was motivated by the study of 
refs 4-46, in which the two-parameter law has been shown to introduce negligible 
bias on the measured properties of transiting systems similar to WASP-96. In addi- 
tion, the quadratic law requires a much shorter computational time to determine 
the relevant transit light curve. We computed the theoretical limb-darkening by 
fitting the limb-darkened intensities of the three-dimensional stellar atmosphere 
models, factored by the throughputs of the blue and red grisms. 

The covariance matrix is defined as K= 0,76, + k;,, where o; are the photon 
noise uncertainties, 6j is the Kronecker delta function and kj is a covariance func- 
tion. We assumed the white noise term ow was the same for all data points and 
allowed it to vary as a free parameter. For the covariance function, we chose to use 
the Matérn v= 3/2 kernel with the spectral dispersion and cross-dispersion drifts 
x and y as input variables, and the full-width at half-maximum (FWHM) measured 
from the cross-dispersion profiles of the two-dimensional spectra and the speed 
of the rotation angle z (see Extended Data Fig. 4). As with the linear time term, we 
also standardized the input parameters before the light curve fitting. We chose to 
use the dispersion and cross-dispersion drifts for both observations, and combined 
them with the FWHM for the blue data and the speed of the rotation angle for the 
red data, respectively. Our choice was justified based on the fact that those combi- 
nations of input parameters gave well behaved residuals. The covariance function 
was then defined as: 


2 
ky=A (1+ V3 Dyexp(—V3 Dy) 


where A is the characteristic correlation amplitude and 
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where T;, Ty and 7; are the correlation length scales and the hatted variables are 
standardized. We allowed parameters X = (cg, C), Tig 4 @/Rus Ry/Rys Uy, Uy) and 
Y=(A,t, Ty T,) to vary and fixed the orbital period P to its literature valuel?. 
We adopted uniform priors for X and log-uniform priors for Y. 

To marginalize the posterior distribution p(8,y | f) x p(f | 4, 7)p(0, 7) we 
made use of the Markov chain Monte Carlo software package emcee“. We identified 
the maximum likelihood solution using the Levenberg-Marquardt least-squares 
algorithm” and initialized three groups of 150 walkers close to that maximum. We 
run groups one and two for 350 samples and the third group had 4,500 samples. 
Before running for the second group we re-sampled the positions of the walkers in 
a narrow space around the position of the best walker from the first run. This extra 
re-sampling step was useful because otherwise some of the walkers can start in a 
low-likelihood area of parameter space and require more computational time to 
converge. Transit models for each of the two observations computed using the 
marginalized posterior distributions are shown in Extended Data Fig. 1 and the 
relevant parameter values are reported in Extended Data Table 1. We find residual 
dispersion of 79 and 203 parts per million for the blue and red light curves, respec- 
tively. Both values are found to be within 80% of the theoretical photon noise limit. 
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We computed the weighted mean values of the orbital inclination and semi-major 
axis and repeated the fits. In the second fit we allowed only the planet-to-star 
radius ratio R,/R« and the two limb darkening coefficients u; and uy to vary, while 
the orbital inclination and semi-major axis were held fixed to the weighted mean 
values and the central times were fixed to the values determined from the first fit. 
Spectroscopic light curve analysis. We produced spectroscopic light curves by 
summing the flux of the target and reference star in bands with a width of 160 A. 
The sodium D lines at 5,890 A and 5,896 A fall inside the spectral range of each 
grism, within their overlapping region of 5,300-6,200 A. We centred the set of 
bins for each night on the sodium line, which gave the advantage of obtaining two 
radius measurements identical in wavelength coverage within that overlapping 
region. We merged two pairs of bins, covering the O A and B bands from 7,594 A 
to 7,621 A and from 6,867 A to 6,884 A, respectively, to increase the signal-to-noise 
ratio of the corresponding light curves. The very first band in the blue grism was 
also enlarged with the same motivation. With these customizations we produced 
a total of 63 light curves. 

Common mode factors. The FORS2 spectroscopic light curves are known to 
exhibit wavelength-independent (common mode) systematic effects, as demon- 
strated from our Comparative Transmission Spectroscopy studies””*? and other 
FORS2 results**-°°. This makes the instrument outstanding for transmission 
spectroscopy, with enormous potential to explore the diversity of exoplanet atmos- 
pheres. We established the wavelength-independent systematic effects using the 
band-integrated white-light curves for each of the two nights. We simply divided 
the white-light transit light curves by a transit model. We computed the transit 
model using the weighted mean values of the orbital inclination and normalized 
semi-major axis from both nights and assumed the central times found from the 
white-light analysis. The values for the relative radius and the limb-darkening 
coefficients were identified by repeating the Gaussian process fit where the tran- 
sit central time, orbital inclination and semi-major axis were held fixed to the 
weighted-mean values. The fitted relative radii and limb-darkening coefficients 
are reported at the end of Extended Data Table 1. The common mode factors for 
each night are shown in Extended Data Fig. 1 along with a schematic explanation 
of the full white-light curve analysis. 

Spectroscopic light curve fits. We modelled the spectroscopic light curves using a 
two-component function that takes into account the systematics and transit simul- 
taneously. The transit model was computed using the analytical formulae of ref. *”, 
as for the white-light curve, but in the fits we allowed only the relative planet radius 
R,/R«and the linear limb-darkening coefficient u to vary. Fitting only for the linear 
limb-darkening coefficient is standard practice for ground-based observations 
and has proved generally to perform well®!~°°. Similar to our WASP-39b study 
with FORS2”, we also fitted for both limb-darkening coefficients and found that 
the uncertainty of uw is large and consistent with the theoretical prediction. We 
interpret this as an indication for insufficient constraining power of the data for the 
nonlinear coefficient. However, as the transmission spectra did not substantially 
change we chose to fix u and to fit only for 1. Prior to fitting the spectroscopic 
light curves, we removed the common mode factors from each night by dividing 
each of the spectroscopic light curves by the corresponding common-mode light 
curve of the same night. We accounted for the systematics using a low-order 
polynomial (up to second degree with no cross terms) of dispersion and cross- 
dispersion drift, air mass, FWHM variations and the rate of change of the rotator 
angle (Extended Data Fig. 4). We produced all possible combinations of detrending 
variables and performed separate fits with each combination with the systematics 
function included in the two-component model. For each attempted function, we 
computed the Akaike Information Criterion®” to estimate the statistical weight of 
the model depending on the number of degrees of freedom. We marginalized the 
resulting relative radii and linear limb-darkening coefficient following ref. **. We 
chose to rely on the Akaike Information Criterion instead of other information 
criteria, for example, the Bayesian Information Criterion®, because the Akaike 
Information Criterion selects more complex models, resulting in more conserv- 
ative error estimates. We note that a marginalization over multiple systematics 
functions relies on the assumption of equal prior weights for each tested model. 
This assumption is valid for simple polynomial expansions of basis parameters, like 
the ones in our study. We found systematics models parameterized with linear air 
mass, dispersion drift and FWHM terms to result in the highest statistical weight. 

Prior to each fit, we set the uncertainties of each spectrophotometric channel 
to values that are based on the expected photon noise with additional component 
from readout noise. We determined best-fit models using a Levenberg-Marquardt 
least-squares algorithm and rescaled the uncertainties of the fitted parameters with 
the dispersion of the residuals. All residual outliers larger than 30 were excluded 
from the analysis. We found that for each spectroscopic light curve fewer than three 
data points were removed. We also assessed the levels of correlated residual red 
noise, following the methodology of ref. © by modelling the binned variance with 
theo? = 02/N + o;’ relation, where ow is the uncorrelated white noise component, 
Nis the number of measurements in the bin and 0, is the red noise component. 
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We find white and red noise dispersion in the range 400-1,000 and 20-80 parts 
per million, respectively. 

The measured wavelength-dependent relative radii and corresponding light 
curves are given in Fig. 1, Extended Data Figs. 2-5, and Extended Data Table 2 
and comprise the transmission spectrum of WASP-96b. We used the overlapping 
wavelength region to combine the two observations and computed the weighted 
mean of both datasets. We detect a marginally significant 1.40 difference in the 
transit depths of the light curves from the two observations of (7.2 +5.2) x 10~*. 
This level of variation is consistent with the photometric variability, associated 
with the active regions on the star surface with variability of 7< 9.2 x 10-4 (ref. 1). 
Modelling of the atmosphere of WASP-96b. First we compare the observed trans- 
mission spectrum to the models based on ref. !°, which included a self-consistent 
treatment of radiative transfer and chemical equilibrium of neutral and ionic 
species. Chemical mixing ratios and opacities were computed assuming solar 
metallicity and local chemical equilibrium, accounting for condensation and 
thermal ionization but not photoionization®!-. Transmission spectra were also 
calculated using one-dimensional T-P profiles for the dayside, as well as an overall 
cooler planetary-averaged profile. 

In addition to the atmospheric models of a clear atmosphere, a simplified treat- 
ment of the scattering and absorption have been incorporated in those models to 
simulate the effect of small particle haze aerosols and large particle cloud con- 
densates at optical and near-infrared wavelengths. In the case of haze, Rayleigh 
scattering opacity (0 =9(A/ Ao) “) has been assumed with a cross-section which 
was 1,000 x the cross-section of molecular hydrogen gas (09 =5.31 x 10-77 cm? at 
No = 3,500 A; ref. °'). To include the effects of a flat cloud deck we included a wave- 
length-independent cross-section, which was a factor of 100 x the cross-section of 
molecular hydrogen gas at Ay = 3,500 A (see Fig. 1). 

As in our previous studies, we obtained the average values of the models 
within the wavelength bins and fitted these theoretical values to the data with a 
single parameter responsible for their vertical offset®’***. The x? and Bayesian 
Information Criterion statistic quantities were computed for each model with the 
number of degrees of freedom for each model determined by v= N — m, where N 
is the number of measurements and m is the number of free parameters in the fit. 

We also compared the observed transmission spectrum around the Na D line 

from 4,500 A to 6,800 A with cloud-free atmosphere models, assuming two indi- 
vidual shapes of the pressure broadened profile. The first profile had the shape 
detailed in ref. * and the second line shape was taken from the formalism of 
refs '8?6, We find no statistically significant difference between the two with \?=22 
and y?=30 for 35 degrees of freedom. Further observations at higher signal-to- 
noise ratios and resolution will be necessary to distinguish between the two line- 
wing shapes. 
Retrieval analysis. We performed a retrieval*”™ analysis using the one-dimen- 
sional radiative-convective equilibrium ATMO model***>-®, The code includes 
isotropic multi-gas Rayleigh scattering, H,.-H2 and H2-He collision-induced 
absorption, as well as opacities for all major chemical species taken from the most 
up-to-date high-temperature sources, including H2, He, HO, CO2, CO, CH, Na, 
K, Li, Rb and Cs. Hazes and clouds are respectively treated as parameterized 
enhanced Rayleigh scattering and grey opacity”, which is similar to the approach 
of ref. '° and is detailed in ref. ®°. ATMO uses the correlated-k approximation with 
the random overlap method to compute the total gaseous mixture opacity, which 
has been shown to agree well with a full line-by-line treatment®. 

We first performed a free retrieval analysis, allowing a fit to the abundances 
of Na, K and Li, cloud and haze opacity, the planet’s radius and atmospheric tem- 
perature. Na, K and Li were included because these elements are expected to add 
substantial opacity in the wavelength region of our observations. We assumed that 
these gases are well mixed vertically in the atmosphere. 

For all free parameters in our model we adopted uniform priors with the follow- 
ing ranges: 250-3,000 K for the temperature, 1 Rj-1.5R; for the radius, 107° to 10° 
for the cloud and 10~!° to 10° for the haze opacity. For the mixing ratios of chemical 
species other than H and He we adopted uniform priors between —2 and 12. Our 
retrieval analysis proceeded by first identifying the minimum y’ solution using 
nonlinear least-squares optimization and then marginalizing over the posterior 
distribution using differential-evolution Markov chain Monte Carlo’’. A total of 12 
chains for 30,000 steps each were run until the Gelman-Rubin statistic for each free 
parameter was within 1% of unity, showing that the chains were well mixed and had 
reached a steady state. We discarded a burn-in phase from all chains corresponding 
to the step at which all chains had found a \” below the median y? value of the 
chain. Finally, we combined the remaining samples into a single chain, forming our 
posterior distributions. We summarize the results from the free retrieval in Fig. 2. 

Elemental abundances of Na and Li were obtained for the host star from 
high-resolution optical spectroscopy with a typical signal-to-noise ratio of about 
100:1, as detailed in ref. !°. Because the K resonance doublet is located outside 
the red limit of the high-resolution spectra, it was not possible to measure its 
abundance. 
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The best-fitting retrieved spectrum suggests a cloud-free atmosphere at the limb 
with precise sodium abundance consistent with the solar value at about lo. The 
spectrum shows no definitive evidence of the potassium feature. A measurement 
at around 0.73 j1m shows a slightly larger radius, though it is only about 1.80 
above our best-fitting model. Our retrieval analysis shows a marginalized poste- 
rior distribution with only an upper bound with median value consistent with the 
solar value at approximately 20 confidence (Figs. 1 and 2). This is most probably a 
consequence of the limited wavelength range, covering only about two-thirds of the 
potassium feature, as well as the diluting effect of the O2-A band, which partially 
covers this wavelength regime. 

The limited constraining power of the K feature allows for a three-case scenario 
for the abundance of K in the atmosphere of WASP-96b. In all three cases the 
atmosphere of the planet is clear at the limb with sodium at solar abundance. In 
the first scenario, the abundance of potassium is sub-solar, leading to a missing 
potassium feature in the transmission spectrum. In chemical equilibrium, Na and 
K can form condensates (such as Na2S and KCl) at temperatures lower than about 
1,300 K and 1,000K, respectively. However, given that the retrieved temperature 
in the region probed by our observation lies between about 1,300 K and 1,700K, 
this is unlikely. In the second scenario, the potassium can be ionized by the ultra- 
violet radiation of the host star, which would also lead to missing potassium line 
cores. The pressure-broadened line wings would still be present, because they 
would originate from deeper layers, where the ultraviolet radiation would be able 
to penetrate. We estimate the relative sodium-to-potassium ratio from our free 
retrieval analysis, finding a highly uncertain value, which prevents any definitive 
conclusion (Extended Data Fig. 6). Future observations of multiple transits could 
help resolve the two-case scenario. A third scenario would require a stratified 
atmosphere consisting of a low-altitude potassium layer with cloud cover on top, 
above which is located a clear atmosphere containing sodium at solar abundance. 
The sodium layer would need to be deep enough to allow for pressure-broadened 
line wings in the transmission spectrum. 

We note a large degeneracy between the aerosol clouds/hazes and the Na abun- 
dance, as higher cloud opacity levels can be fitted with increased Na abundances. 
This degeneracy can be seen in the posterior distribution of the retrieval (see 
Extended Data Fig. 8), and the marginalized posterior distributions are shown in 
Fig. 2 (that is, the distribution shown in Extended Data Fig. 8 integrated along each 
axis). The degeneracy is accounted for in our retrieval modelling by marginalizing 
the Na abundance over the other model fit parameters, which includes the effects 
of clouds and hazes. An aerosol-free model where the near-ultraviolet transmission 
spectra is dominated by H2 Rayleigh scattering helps us to determine the lower 
limit to the Na abundance. The upper limit for the Na abundance is sensitive to the 
line profile shape, as very high aerosol opacity levels probe much lower pressure 
levels, affecting the line profile. For example, for a clear atmosphere the transmis- 
sion spectra at 4,000 A probes pressures of 20 mbar and is dominated by molecular 
hydrogen at that wavelength. In a model where the haze opacity is 1,000 x stronger 
than molecular hydrogen (the pink haze model shown in Fig. 1), the pressure 
probed is about 1,000 x lower or about 0.02 mbar. These lower pressures affect the 
calculation of the pressure-broadened sodium line profile. As described in ref. 3, 
using semi-classical impact theory the half-width of the Lorentzian sodium line 
core is determined by the effective collision frequency, which itself is a product 
of the H2-perturber density (among other factors). We find that in the described 
scenario with very high sodium abundances and high cloud/haze opacities, very 
low pressures (such as 0.02 mbar) are probed and the sodium line profile becomes 
too narrow to be able to fit the wide profile seen in the data, even at arbitrarily 
high abundances. 

We acknowledge that the one-dimensional model does not account for hori- 
zontal advection, which may affect the composition. This can break the chemical 
equilibrium condition, leading to horizontally constant non-equilibrium abun- 
dances of various species for example, CH, (ref. ””), which could potentially affect 
the observed transmission spectrum. However, an analysis of such effects is beyond 
the scope of the present Letter. 

The presence of sodium and potassium in the atmospheric spectra of irradiated 
gas giant exoplanets has proved puzzling, with some of the spectra showing both or 
either of the two features without a clear trend associated with the planetary prop- 
erties. Primordial abundance variation along with atmospheric processes such as 
condensation, photochemistry and photoionization have been hypothesized to be 
responsible for the observed alkali variation. Elemental abundance constraints with 
ground-based instruments such as FORS2 could help identify the role of some of 
those processes from statistically large samples of exoplanet spectra. 

Exoplanet atmospheric metallicities have been estimated using the scaling rela- 
tion of atmospheric metallicity with the abundance of detected absorption features. 
At present, such estimations have been obtained using the 1.4-\1m water absorption 
feature. Recently, ref. ”* obtained such measurement from a combined HST + VLT 
spectroscopy. We performed a retrieval analysis assuming chemical equilibrium in 
the atmosphere and determine the metallicity of WASP-96b’s atmosphere, finding 


consistency with the metallicity of the host star and the mass—metallicity trend 
established for Solar System giant planets (Fig. 3). We acknowledge that such 
estimations can largely be inaccurate owing to the metallicity scaling assump- 
tion with elemental abundance and round off the precision of the WASP-96b’s 
metallicity to one significant figure. The current census of exoplanet metallicity 
measurements exhibits a large scatter across the full mass-metallicity diagram, 
which hampers definitive conclusion regarding a trend. As in ref. “ we plot the 
relative planet-to-star heavy-element enrichment as a function of the planet mass 
instead of the planet metallicity alone (Extended Data Fig. 7). We convert Z)/Zo 
to Z,/Z+, assuming a scaling approximation with the parent star’s iron abundance 
of the form Z,/Z«= 104], where Z = 0.014 and propagate the uncertainty of 
[Fe/H]. It remains for future work to improve the statistics on the mass-metallicity 
diagram with additional precise measurements. We therefore consider it premature 
to propose any trends or emerging patterns or interpretation. 

Code availability. Publicly available custom codes were used for the Gaussian pro- 
cess modelling: George (http://dfm.io/george/current/user/gp/) and emcee (http:// 
github.com/dfm/emcee). The Markov chain Monte Carlo retrieval analyses were 
performed using the publicly available package exofast (http://astroutils.astronomy. 
ohio-state.edu/exofast). The ATMO code used to compute the atmosphere models 
is currently proprietary. We have opted not to make the customized IDL codes used 
to produce the spectra publicly available owing to their undocumented intricacies. 
Data availability. The data are within the standard proprietary period of one year 
and will become publicly available on the European Southern Observatory archive 
(July and August) in 2018. 
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Extended Data Fig. 1 | VLT FORS2 stellar spectra and white-light 
curves. Left and right panels show the GRIS600B (blue) and GRIS600RI 
(red) datasets, respectively. The top row shows example stellar spectra 
used for relative spectrophotometric calibration. The dashed lines indicate 
the wavelength region used to produce the white-light curves. The second 
row shows normalized raw light curves for both sources. The third row 
shows normalized relative target-to-reference raw flux along with the 


marginalized Gaussian process model (A), the detrended transit light 
curve and model (B), and the common-mode correction (A/B). The fourth 
row shows the best-fit light curve residuals and 1o error bars, obtained 

by subtracting the marginalized transit and systematics models from the 
relative target-to-reference raw flux. The two light curve residuals show 
dispersions of 78 and 201 parts per million, respectively. 
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Extended Data Fig. 2 | Spectrophotometric light curves from grism curves and the transit model with the highest statistical weight. The fourth 
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with the highest statistical weight. The third panel shows detrended light 
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Extended Data Fig. 5 | Transmission spectrum of WASP-96b. Indicated are the relative radius measurements from grism 600B (blue dots) and 600RI 
(red dots) along with the 1c uncertainties, compared to the same set of models as in Fig. 1. 
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Extended Data Fig. 6 | Na/K ratio. Histogram of the marginalized posterior distribution of the Na to K ratio for WASP-96b. Shown are the median and 
1a levels (orange continuous and dotted lines, respectively). The solar value is indicated by the blue continuous line. 
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Extended Data Table 1 | System parameters 


Parameter Value 
Orbital period (day) 3.4252602 (fixed) 
eccentricity 0 (fixed) 
GRIS600B 4013 - 6173A 
Tmia (JD) 57963.336727 900034 
i(°) 85.2179.17 
a/R. 8.934918 
Rp/Ra 0.1141 60014 
Uy 0.39979 030 

Up 0.14875546 

A (ppm) 501779” 
Newum (arbitrary normalisation) 1.01732, 

n, (arbitrary normalisation) —3.451529 

ny (arbitrary normalisation) —1.867?38 

Co 0.999275 0016 
Cy —0.0006870 50012 
GRIS60O0RI 5268 - 8308A 
Taig (JD) 57987.31195*9.00029 
i (°) 85.1179:13 
a/R. 8.807313 
Rp/Rx 0.11727 5:9075 
Uy 0.2670 :58 

Uy 0.2373 -42 

A (ppm) 23012 741° 
Neas (arbitrary normalisation) 0.507383 

1, (arbitrary normalisation) 2.815536 

ny (arbitrary normalisation) 8.08732 

Co 0.9980* 50018 
Cy 0.000187 555028 


Weighted mean: 


i(°) 85.14 + 0.10 

a/R. 8.844508 
GRIS600B (fixed i, a/R. aNd Tmnia) 
Ry/Re 0.114775:0014 

Uy 0.43579 530 

Up 0.17379 :032 
GRIS600RI (fixed i, a/R. aNd Tynia) 
Ry/Rs 0.116810-0015 

uy 0.2827 5:932 

Uy 0.250*9 032 
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Extended Data Table 2 | Transmission spectrum 


a (A) R,/Rs Uy Uz 
3500 — 4013 0.11479 + 0.00152 0.484 + 0.059 0.207 
4013 — 4093 0.11447 + 0.00182 0.553 + 0.059 0.219 
4093 — 4173 0.11382 + 0.00167 0.539 + 0.053 0.223 
4173 — 4253 0.11554 + 0.00167 0.493 + 0.052 0.227 
4253 — 4333 0.11459 + 0.00205 0.458 + 0.054 0.231 
4333 — 4413 0.11321 + 0.00163 0.464 + 0.049 0.235 
4413 — 4493 0.11505 + 0.00129 0.479 + 0.043 0.239 
4493 — 4573 0.11466 + 0.00123 0.522 + 0.039 0.242 
4573 — 4653 0.11462 + 0.00125 0.415 + 0.043 0.246 
4653 — 4733 0.11361 + 0.00116 0.422 + 0.042 0.251 
4733 — 4813 0.11282 + 0.00100 0.404 + 0.038 0.255 
4813 — 4893 0.11324 + 0.00130 0.385 + 0.040 0.256 
4893 — 4973 0.11440 + 0.00104 0.337 + 0.041 0.258 
4973 — 5053 0.11451 + 0.00138 0.367 + 0.044 0.262 
5053 — 5133 0.11326 + 0.00112 0.410 + 0.038 0.269 
5133 — 5213 0.11318 + 0.00101 0.288 + 0.043 0.273 
5213 — 5293 0.11317 + 0.00096 0.336 + 0.040 0.275 
5293 — 5373 0.11444 + 0.00089 0.260 + 0.039 0.277 
5373 — 5453 0.11365 + 0.00095 0.349 + 0.047 0.278 
5453 — 5533 0.11361 + 0.00089 0.299 + 0.042 0.283 
5533 — 5613 0.11322 + 0.00102 0.330 + 0.046 0.287 
5613 — 5693 0.11541 + 0.00095 0.303 + 0.050 0.290 
5693 — 5773 0.11499 + 0.00095 0.284 + 0.044 0.291 
5773 — 5853 0.11653 + 0.00084 0.263 + 0.044 0.295 
5853 — 5933 0.11685 + 0.00092 0.270 + 0.041 0.296 
5933 — 6013 0.11648 + 0.00086 0.236 + 0.049 0.296 
6013 — 6093 0.11607 + 0.00086 0.242 + 0.050 0.297 
6093 — 6173 0.11545 + 0.00081 0.169 + 0.053 0.302 
6173 — 6253 0.11487 + 0.00096 0.316 + 0.044 0.313 
6253 — 6333 0.11648 + 0.00099 0.235 + 0.041 0.314 
6333 — 6413 0.11457 + 0.00096 0.194 + 0.045 0.314 
6413 — 6493 0.11499 + 0.00091 0.185 + 0.046 0.314 
6493 — 6573 0.11553 + 0.00100 0.267 + 0.043 0.313 
6573 — 6653 0.11284 + 0.00089 0.250 + 0.040 0.313 
6653 — 6733 0.11532 + 0.00081 0.192 + 0.038 0.313 
6733 — 6813 0.11442 + 0.00117 0.155 + 0.044 0.314 
6813 — 6973 0.11189 + 0.00078 0.087 + 0.042 0.314 
6973 — 7053 0.11311 + 0.00090 0.100 + 0.052 0.314 
7053 — 7133 0.11483 + 0.00096 0.129 + 0.051 0.314 
7133 — 7213 0.11428 + 0.00085 0.104 + 0.044 0.315 
7213 — 7293 0.11463 + 0.00097 0.156 + 0.048 0.315 
7293 — 7373 0.11572 + 0.00094 0.277 + 0.038 0.315 
7373 — 7453 0.11357 + 0.00081 0.102 + 0.050 0.316 
7453 — 7533 0.11405 + 0.00096 0.263 + 0.046 0.316 
7533 — 7693 0.11473 + 0.00075 0.147 + 0.042 0.315 
7693 — 7773 0.11312 + 0.00109 0.190 + 0.049 0.315 
7773 — 7853 0.11341 + 0.00096 0.189 + 0.049 0.315 
7853 — 7933 0.11336 + 0.00123 0.175 + 0.053 0.314 
7933 — 8013 0.11287 + 0.00117 0.214 + 0.059 0.314 
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Fundamental limits to graphene plasmonics 


G. X. Nil?, A. S. McLeod!, Z. Sun?, L. Wang?, L. Xiong!, K. W. Post?, S. S. Sunku!’, B.-Y. Jiang?, J. Hone’, C. R. Dean!, 


M. M. Fogler? & D. N. Basov!?* 


Plasmon polaritons are hybrid excitations of light and mobile 
electrons that can confine the energy of long-wavelength radiation 
at the nanoscale. Plasmon polaritons may enable many enigmatic 
quantum effects, including lasing’, topological protection”? and 
dipole-forbidden absorption‘. A necessary condition for realizing 
such phenomena is a long plasmonic lifetime, which is notoriously 
difficult to achieve for highly confined modes®. Plasmon polaritons 
in graphene—hybrids of Dirac quasiparticles and infrared 
photons—provide a platform for exploring light-matter interaction 
at the nanoscale®’. However, plasmonic dissipation in graphene 
is substantial® and its fundamental limits remain undetermined. 
Here we use nanometre-scale infrared imaging to investigate 
propagating plasmon polaritons in high-mobility encapsulated 
graphene at cryogenic temperatures. In this regime, the propagation 
of plasmon polaritons is primarily restricted by the dielectric losses 
of the encapsulated layers, with a minor contribution from electron- 
phonon interactions. At liquid-nitrogen temperatures, the intrinsic 
plasmonic propagation length can exceed 10 micrometres, or 50 
plasmonic wavelengths, thus setting a record for highly confined 
and tunable polariton modes. Our nanoscale imaging results reveal 
the physics of plasmonic dissipation and will be instrumental in 
mitigating such losses in heterostructure engineering applications. 

Here we investigate plasmon polariton propagation and dissipation 
using near-field infrared microscopy, enabling a direct visualization of 
polaritonic standing waves on the surface of graphene and other van 
der Waals materials®”?°. Both the wavelength of plasmon polariton 
waves, Ap, and their travel range, which quantifies their dissipation, 
can be readily confirmed from nanoscale infrared images (Figs. 1, 2). 
These two observables permit direct extraction of the response func- 
tions of a plasmonic medium, including the complex conductivity 
a(w) =o'+ io", as we detail below. The response functions encode 
information on both interactions and scattering processes in an elec- 
tron liquid. Thus, near-field examination of plasmon polariton images 
facilitates inquiry into fundamental electronic phenomena of the media 
supporting these polaritons. As customary in solid-state physics, scat- 
tering processes of diverse origin can be distinguished by investigating 
their dependence on control parameters—in particular, temperature. 
This motivates a plasmonic imaging investigation of graphene at cryo- 
genic temperatures with the help of a newly developed apparatus!!. We 
examined a high-mobility micro-device constructed from a graphene 
monolayer encapsulated between two thin slabs of hexagonal boron 
nitride (hBN), with a graphene carrier density that is tunable through a 
silicon back gate (Fig. 1a, b). The encapsulation preserved the graphene 
in its pristine form. Our cryo-imaging approach to recording plas- 
mon polariton standing waves, together with a theoretical model free of 
adjustable parameters, has enabled us to identify the physics governing 
plasmon propagation in high-mobility graphene. 

Nanoscale imaging of plasmon polaritons is achieved using the fol- 
lowing experimental protocol. The metallized tip of an atomic force 
microscope (AFM) probe is illuminated by a focused laser field with 
an infrared frequency of w. The tip functions as an optical antenna 
that intensifies the incident electric field at the apex. When the tip 


is brought near the sample, this concentrated field can excite polari- 
tonic modes with a wavelength of \,(w) that is much shorter than the 
free-space wavelength \jp= 27/w of the incident light'?. The optical 
antenna functionality of the AFM tip also facilitates scattering of local 
near fields back into the far field, where they are registered by conven- 
tional optical detection. Proper demodulation methods allow one to 
isolate the back-scattered light component that is associated with the 
local electric field confined to the area under the tip, thus achieving a 
spatial resolution of the order of 10 nm (ref. !*). In such a near-field 
imaging experiment, plasmon polaritons manifest as a periodic mod- 
ulation (fringes) of the observed near-field signal as a function of the 
tip position. Similarly to imaging experiments at room temperature’, 
fringes of two distinct periodicities, \, and A,/2, appear in Figs. 1 and 2. 
The \,/2-period fringes are due to plasmon polaritons that are emitted 
by the tip and complete a round-trip between the tip and reflection 
from the sample edges. The Ap-period fringes are produced by the inter- 
ference between tip-launched plasmon polaritons and the evanescent 
component of the reflected ones!». Alternatively, the \)-period fringes 


° 
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Fig. 1 | Nanoscale infrared imaging of surface plasmons in Au/hBN/ 
graphene/hBN encapsulated structures at cryogenic temperature. 

a, Sketch of the layered Au/hBN/graphene/hBN/SiO,/Si heterostructure. The 
lithographically defined gold (Au) microstructures on top of the hBN capping 
layer serve as effective plasmonic launchers. b, Optical image of the device. 
The black dashed rectangle marks the area shown in c. The entire field of view 
is within the diffraction-limited spot of our infrared laser, which operates at 
an energy of 886cm! (110 meV) or Ayg= 11.28 xm. c, Nanoscale infrared 
image of plasmonic interference fringes from the encapsulated graphene 
monolayer, expressed by the normalized scattering amplitude s acquired 

at a back-gate voltage of 97 V and a temperature of T= 60K. The arrows 
represent the propagation direction of the plasmon waves. These experiments 
simultaneously visualize the local electric field associated with interference 
from plasmon polaritons emitted by the near-field probe and reflected by 
sample edges, as well as from polaritons emitted by the gold microstructures 
(labelled as Au). 
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Fig. 2 | Temperature- and gate-dependent 
trends in surface plasmon propagation in 
graphene. a, Nanoscale infrared images of the 
normalized scattering amplitude s acquired at 
sequential sample temperatures and gate voltages. 
A gold electrode (labelled Au at the top of 

the images) functions as an antenna that emits 
graphene plasmons. b, Line profiles of plasmonic 
interference fringes propagating from left to 
right, as a function of the distance L from the gold 
launcher. The variable attenuation of the 
propagating waveform (solid curves) is 
characteristic of the temperature-dependent 
plasmon scattering rate. Dash-dotted lines 
represent the results of numerical simulations 
performed to identify the temperature 
dependence of the complex plasmon wavevector 
and associated scattering rate (see main text and 
Methods). c, The temperature dependence of the 
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quality factor Q, for both \,- and \,/2-period 
plasmon waves obtained from nanoscale infrared 
images taken at V,=75 V. The red solid curve is 
the theoretical model detailed in Methods. The 
electronic mean free path Imsp(T) is also plotted 
for comparison. The error bars define the 95% 
confidence intervals. d, The voltage (V,) and 
carrier density (n) dependence of the plasmon 
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can arise from plasmon polaritons launched by emitters other than the 
tip that are present in the device®. The efficiency of the plasmon polar- 
iton emission depends on the dielectric polarizability, shape and size 
of the emitter. Specifically, graphene edges are weak emitters, but elon- 
gated metallic objects placed on graphene can act as efficient ones'®. In 
our structures, the gold contacts on top of the hBN-encapsulating layer 
act as fixed plasmonic antennas (Figs. 1, 2). 

We begin with a survey of a large-area (6,1m x 81m) image of plas- 
mon polariton standing waves obtained at T = 60K (Fig. 1c) with an 
infrared laser operating at \yz= 11.28 xm and at a back-gate voltage 
of V,=97 V. Here we display raw data in the form of the amplitude 
s of the scattered field, normalized by the amplitude detected at the 
gold contacts, whose optical response provides a convenient reference 
that is independent from temperature and carrier density (see Methods 
for details). The most prominent aspect of the image in Fig. 1c is that 
the entire field of view is filled with plasmonic interference fringes. 
The A,/2-period fringes are most evident on the right-hand side of the 
field of view, close to a natural boundary of the graphene monolayer, 
whereas \p-period fringes dominate in the vicinity of gold antennas. 
Figure 1c reveals that plasmon polaritons are highly confined, with 
Air/ Ap > 60; however, they travel over several micrometres, far exceed- 
ing previous benchmark results at room temperature®">. 

The nanoscale infrared images and corresponding line profiles in 
Fig. 2 attest to a dramatic reduction of plasmonic losses as the temper- 
ature of the graphene is reduced. Figure 2a shows detailed raster scans 
obtained in the vicinity of one of the gold antennas at V,=75 V and at 
temperatures between 60 K and 300 K. When the gate voltage increases 
above V, =50 V, plasmonic fringes emerge out of a featureless back- 
ground, as illustrated in a voltage-dependent scan shown in Extended 


75 
v, (V) 


100 


Data Fig. 3. At room temperature, plasmon polariton propagation is 
restricted to distances of less than 1 jum. As the temperature is lowered, 
we observe a systematic increase of both the overall travel range and the 
number of detectable fringes. At T= 60 K plasmonic oscillations extend 
beyond 41m at V,=75 V and beyond 51m at V, =97 V (Fig. 2a, b). 
These overall trends are in accord with the temperature dependence of 
the electronic mean free path, Imfp, in high-mobility devices prepared 
under identical protocols!”. In fact, even the magnitudes of the plas- 
mon polariton travel range and of Img appear to be on par with each 
other: Imp increases from 1 \1m at T= 300K up to 81m at 60 K (Fig. 2c). 
The latter value exceeds the entire field of view in Fig. 2a and is con- 
sistent with the notion of ballistic electronic transport. The novelty of 
the images in Fig. 2 is that they manifest the real space behaviour of 
graphene plasmon polaritons supported by ballistic electrons. 

To quantify the quasiparticle dynamics underlying the propagation 
of plasmon polaritons in graphene, we present an analysis based on the 
complex momentum of the polariton, q,=4 4 + ig ". The real part, q : ; 


defines the plasmon polariton wavelength through \,, = 21/q : whereas 
the imaginary part, q # , quantifies dissipation and the quality factor 
Q,=4 “ /q (Table 1). The complex momentum of plasmon polaritons, 


and thus both A, and Q,, are governed by the optical conductivity of 
graphene (see Methods section ‘Plasmons near a graphene edge’) or, 
alternatively, can be obtained through numerical analysis of plasmonic 
fringes (Methods section ‘Plasmonic damping rate analysis’). The evo- 
lution of A, with the gate voltage (or carrier density) is presented in 
Fig. 2d and is in complete accord with the results of earlier studies at 
room temperature®'*!>'7, A noteworthy aspect of our data is the rapid 
growth of the quality factor at low temperatures (Fig. 2c). 
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Table 1 | Comparison of graphene plasmon parameters with those 
of metals (Ag) and doped semiconductors (n-doped InSb and CdO) 


Confinement 


ratio Quality factor Lifetime (fs) 
Definition r/Ap Qp= 95/96 T=2Q,/w 
Graphene (experimental, 66 13 ,600 
T=60K) 
Graphene (intrinsic, T=60K) 66 970 12,000 
Ag?” (T=10K) al 36 14 
n-InSb°, n-CdO*(T=300K) <10 37 270 


Parameters relevant for graphene entries: frequency, w/(2nc) =886 cm}; gate voltage, Vz=75V; 
plasmon wavelength, Ap=170nm; and dielectric loss of the substrate, yz=5.9cm 1 (see Methods 
for details). 

Data from ref, 2°. 

Data from ref, 3°. 

Data from ref. 31. 


This temperature dependence tracks the variation of the electronic 
mean free path. We therefore surmise that the evolution of Q,(T) 
reflects the same fundamental physics as that governing electronic 
transport in graphene, but probed at finite frequency w. The highest 
quality factor detected in our plasmonic imaging experiments is 
Qp=130. 

A detailed quantitative analysis of finite-frequency dissipation of 
Dirac quasiparticles requires that intrinsic dissipation in graphene 
is distinguished from extrinsic losses associated with the dielectric 
environment of graphene. We outline this procedure in Methods 
section ‘Plasmonic damping rate analysis. In Fig. 3a we show the 
plasmonic scattering rate y)(T) =wo'/o”, which quantifies intrin- 
sic dissipation originating from within the graphene monolayer. 
Inspection of these data reveals a dramatic drop in the intrinsic plas- 
monic scattering rate from about 20cm7' at room temperature to 
<2.0cm7' at T= 60K (we note the frequency unit conversion rule: 
1cm~!=30GHz=1.43K). The corresponding direct current (d.c.) 
scattering rate measured for a similar encapsulated device is also plot- 
ted in Fig. 3a, showing the same overall trend in temperature depend- 
ence. Another important finding reported here is that the plasmonic 
scattering rate exceeds the transport scattering rate by a factor of two 
at room temperature. Notably, the same physical model (blue and red 
lines in Fig. 3a), which is free of adjustable parameters, yields a consist- 
ent description of the dissipation encountered by Dirac quasiparticles 
in both d.c. transport and at a finite frequency in the form of collective 
plasmon polaritons. 

We now outline the key elements that contribute to d.c. and plas- 
monic scattering in our model. A substantial contribution to plasmon 
polariton losses is from acoustic phonons, as reported in refs '*'. It is 
practical to discriminate between two distinct loss channels associated 
with acoustic phonons, which in the related literature are commonly 
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referred to as the deformation potential effect and the pseudo-magnetic 
field effect. The deformation potential effect shifts the energy of Dirac 
bands (Fig. 3d), whereas the pseudo-magnetic field effect displaces the 
overall momentum of Dirac cones in opposite directions for the two 
valleys of the Brillouin zone at K and K’ (see Fig. 3d and Methods sec- 
tion ‘Plasmonic damping rate analysis’). First-principle calculations 
show that the deformation potential channel is vanishingly small?°”!, 
leaving the pseudo-magnetic field component as the dominant acoustic 
scatterer. However, the pseudo-magnetic field effect yields a conven- 
tional linear temperature dependence of the quasiparticle scattering 
rate that is nearly indistinguishable for d.c. scattering and plasmonic 
scattering at w = 886 cm! (orange and blue lines in Fig. 3b). This find- 
ing prompted a search for scattering mechanisms with a pronounced 
frequency dependence. Such a contribution emerges from inter- 
valley electron-phonon scattering, with scattering rate 7 (black and 
red lines in Fig. 3b). The matrix element used to numerically evaluate 
x requires the knowledge of both the energy of the zone-boundary 
optical phonon of graphene, wx, and the electron-phonon coupling 
constant, Gx, for which we used the results of prior first-principles cal- 
culations”° (Methods section ‘Plasmonic damping rate analysis’). The 
physical mechanism behind the frequency dependence of 7x is the 
thermally activated nature of phonon-mediated inter-valley scattering, 
which yields a scattering rate proportional to exp[—(wx — w)/T], where 
wx = 1,200cm7!. An additional small contribution to plasmon polari- 
ton damping comes from electron-electron scattering, with scattering 
rate Yee (magenta line in Fig. 3b). The net result is that three added 
channels—the pseudo-magnetic field effect, inter-valley scattering 
and electron-electron scattering—reproduce the observed tempera- 
ture dependence of both d.c. and plasmonic scattering measured for 
ultrahigh-mobility encapsulated graphene. We stress that our scattering 
rate analysis involves accurate evaluation of the temperature-dependent 
losses in hBN microcrystals that are nominally identical to those used 
in the actual graphene structures (Methods section “The optical con- 
stants of hBN’). This procedure enabled the accurate decomposition 
of various scattering mechanisms and thus enhanced the reliability of 
our findings. 

Our experimental results, together with our unified parameter-free 
theoretical analysis, account for quasiparticle dynamics in ultra- 
high-mobility graphene, even at finite frequencies w relevant for 
graphene plasmonics. Our experimental values of the quality factor Q, 
and the lifetime 7 of plasmons in graphene, extracted from cryogenic 
imaging, are higher than those of any other materials used for infrared 
plasmonics (Table 1). Although limited by acoustic-phonon scattering, 
the intrinsic propagation length], = 1/(2q ") of plasmon polaritons in 
our high-mobility structures can exceed 13|1m at low temperatures (the 
intrinsic quality factor is Qp > 9705 Table 1). In this regime, transport 


Fig. 3 | Plasmonic and electronic transport 

in high-mobility graphene. a, Temperature 
dependence of the plasmonic scattering rate 
(squares and circles) and of the d.c. scattering 
rate (diamonds). Solid lines display the results 
of our parameter-free modelling. The error 

bars represent 95% confidence intervals. 

b, Key contributions to plasmonic and electronic 
scattering and their temperature dependences. 

c, Histograms displaying the contribution 

of different scattering channels at T= 60K 

(left) and 300 K (right). d, Schematics of 
transformations of graphene electronic structure 
associated with the deformation potential 

(DP), pseudo-magnetic field (PF) and inter- 
valley scattering (IS). The arrows in denote the 
possible scattering path across the K and K’ 
valleys. 
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of plasmon polaritons is dominated by the dielectric environment and 
can be limited by the physical dimensions of macroscopic devices. 
Although losses associated with the encapsulating hBN layers and SiO, 
are extrinsic to graphene, dielectrics are practically unavoidable in 
actual high-mobility graphene devices. At ambient conditions, losses 
from (1) the dielectric layers, (2) acoustic-phonon scattering and (3) 
inter-valley scattering all sum together with unequal contributions. The 
dielectric losses can be reduced by 20% by using a thicker hBN bottom 
layer to increase the distance between the lossier SiO, and graphene. 
However, the other two mechanisms are more difficult to circumvent. 
We stress that the contribution of electron-electron scattering to 
finite-frequency plasmon damping is small, even though the electron- 
electron collision rate I’. is comparable to the electron-phonon scat- 
tering rate (see Fig. 3b and Methods section ‘Plasmonic damping rate 
analysis’). In general, electronic interactions can influence the conduc- 
tivity only if the Galilean invariance of the system is broken, which is a 
weak effect in our high-carrier-density graphene at low temperature”. 
Our data imply that the contribution of the electron-electron interac- 
tion to plasmonic responses is more prominent at higher temperatures, 
at lower carrier densities or in samples of special geometry. However, 
a major change in the electrodynamic response should occur at fre- 
quencies below the electron-electron collision rate Ie. In that regime 
(called the hydrodynamic regime), which has been recently studied in 
d.c. transport experiments”*4, the plasmons are expected to become 
overdamped by another type of collective modes: energy waves or 
‘demon’ modes**”*, These modes could be studied using terahertz near- 
field microscopy combined with photocurrent detection”®. Finally, we 
note that the extrinsic plasmonic de-phasing is surprisingly weak in 
many macroscopic areas of Fig. 1c on which we focused our analysis, 
whereas its presence is obvious in other areas. These remaining extrin- 
sic limitations on plasmon polariton propagation, including the possi- 
ble role of sample boundaries”’, will be a subject of future nanoscale 
imaging studies. Nevertheless, this first report of plasmonic nanoscale 
imaging studies of encapsulated graphene at cryogenic temperatures 
demonstrates potential for the exploration of plasmonic switching and 
nonlinear phenomena in ultrahigh-mobility graphene at the liquid- 
nitrogen temperatures that are typically used for mid-infrared technol- 
ogies. Besides imposing fundamental limits on plasmon damping, the 
electron-phonon interaction probed here may have a role in other 
many-body phenomena, such as the recently discovered superconduc- 
tivity in twisted bilayer graphene”®. 
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METHODS 
Device fabrication and characterization. Our high-quality hBN/graphene/hBN 


devices were fabricated using a polymer-free multilevel stacking dry-transfer pro- 
cess to form the encapsulated graphene structure”. Briefly, a very thin (7 nm) 
hBN crystal was used to pick up single-layer graphene and another hBN crystal 
in sequence. Bubble- and wrinkle-free stacks were then transferred onto SiO2/Si 
substrates. For the etching process, only a small portion of the graphene edge was 
exposed for edge-contact deposition (Extended Data Fig. 1). Thus, most of the 
graphene specimen remained intact. This technique is essential for maintaining 
graphene microcrystals free of contamination, bubbles and wrinkles. Consequently, 
the charge neutrality point of graphene in these samples is very close to zero gate 
voltage at ambient conditions, as confirmed from our near-field gating experiments 
(Extended Data Fig. 1). After multi-stacking transfer, metal contacts were fabri- 
cated at the exposed graphene edge. We note that graphene was entirely encapsu- 
lated within the hBN flakes. 

Electrical contacts to graphene were provided by a one-dimensional metallic 
edge of deposited gold, permitting completely polymer-free assembly of the device 
layers’. Together with a silicon back gate fabricated under 285 nm of thermal 
oxide, these contacts enabled in situ tuning of the graphene carrier density to 
values as high as 7 x 10’? cm”? for electrons and 5 x 10!” cm ~* for holes during 
the imaging experiment. During our nanoscale infrared imaging experiments, 
these metallic contacts were also used as effective emitters of graphene plasmons 
through their efficient in-coupling (scattering) of incident infrared light into 
polaritons, as reported previously!® and as detailed below. Our device includes a 
thin, 7-nm-thick top layer of hBN to facilitate nanoscale infrared imaging of the 
embedded graphene monolayer owing to the finite penetration depth of electric 
near-fields and subsurface detection capabilities of nanoscale infrared imaging*!°. 
The hBN is optically transparent at our imaging energy of 886cm™! (110 meV), 
ensuring that our nanoscale infrared i plasmons show oscillations with a single p 
mages strictly document the plasmonic optical response of graphene. 

The AFM topography image of the device is shown in Extended Data Fig. la. 
Extended Data Fig. 1b and c shows room-temperature near-field images obtained 
at gate voltages V,=0 V and 50 V, respectively. In Extended Data Fig. 1b, the near- 
field shows practically no contrast resulting from the presence of graphene crystal. 
This confirms that the encapsulation by hBN kept graphene nearly charge-neutral 
at ambient conditions. By contrast, Extended Data Fig. 1c clearly shows a rectan- 
gular graphene shape due to infrared contrast arising from the overall conductivity 
enhancement under gating. Moreover, multiple plasmonic fringes along the Au 
antenna and graphene edges can be detected, which are discussed in the main text 
and also in ‘Plasmons near a graphene edge’ below. 

Cryogenic infrared nanoscale imaging experiments. The nanoscale infra- 
red imaging experiments were performed using a home-built scattering-type 
scanning near-field optical microscope (SNOM) operating with variable sample 
temperatures of 20-450 K. All measurements were conducted under ultrahigh 
vacuum conditions at a pressure lower than 10~° mbar to prevent sample-surface 
contamination. Further technical details of the microscope are documented in 
ref. ''. The cryogenic scattering-type SNOM is equipped with continuous-wave 
mid-infrared quantum cascade lasers (DRS Daylight Solutions) and CO; lasers 
(Access Laser). The scattering-type SNOM is based on an AFM, which in the 
present experiments operated in non-contact mode using cantilevered metallic 
AEM probes with a tip apex radius of ~25 nm and tapping frequencies of ~270 kHz. 
A pseudo-heterodyne interferometric detection module is implemented to extract 
both the scattering amplitude s and the phase of the near-field signal; here, we dis- 
cuss only the former. To suppress background contributions to the back-scattered 
near-field signal, we demodulated the detected signal at the third harmonic of the 
probe tapping frequency. 

Plasmons near a graphene edge. In the main text, we focused on plasmons 
launched by the Au pad. In this section, we present temperature-dependent data 
for plasmons launched by the tip of the scattering-type SNOM near the graphene 
edges. Extended Data Fig. 2a shows raster-scanned images of the edge regions at 
a gate voltage of V, = 75 V, taken over a temperature range of T= 60-300K. At 
T=300K, plasmons propagate to about 1 |tm from the edge of the sample, con- 
sistent with the earlier experiments®!°. As the temperature is lowered, both the 
range and the number of detectable plasmonic fringes increase. At T= 60 K, the 
number of plasmonic oscillations almost doubles compared to the room temper- 
ature. This behaviour is consistent with what is observed for plasmons launched 
by the Au pad (Fig. 2). 

Whereas images of the Au-launched plasmons show oscillations with a single 
period, ,, the tip-launched plasmons near the edge exhibit two distinct periodic- 
ities, \y and \,/2, as a function of the tip—edge distance L’. According to previous 
studies*’®, the A,/2 period is due to the standing wave produced by plasmons 
making a round-trip from the tip to the edge and back. In addition to the stand- 
ing wave, the total electric field in the system contains a component that does 
not oscillate, but instead decays as a power-law of the distance x from the edge. 


This part of the total field, which exists because of the long-range nature of the 
Coulomb interaction, creates the Ap-period oscillations as a function of the tip- 
edge separation L’ in the images. 

To model the scattering-type SNOM images shown in Extended Data Fig. 2a 
quantitatively, we used an electromagnetic solver described previously”. This 
software uses a boundary-element method to solve the integral equation of the 
convolution type 

EP = Dip (1) 
for the quasi-static scalar potential (x, y) induced in graphene in response to 
the external perturbation, which comprises the incident field plus the potential 
Piip(x, y) created by the charges on the tip (the hat denotes an integral operator). 
The potential Pip is not known a priori; it depends on ®, on the incident field 
and the tip geometry in a self-consistent manner (see refs °°"). In our model, the 
tip is approximated by a metallic spheroid”! and the incident field is taken to be 
uniform. 

The kernel of equation (1) has the physical meaning of the two-dimensional 
dielectric function of graphene, which has the well known momentum-space 
representation 
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are the complex plasmon wavenumber and the sheet conductivity of graphene, 
respectively, and & is the effective dielectric function of the graphene environment; 
see ‘Plasmonic damping rate analysis. We note that the plasmon wavelength \, and 
the plasmon quality factor Q are given by 
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Our software code solves equation (1) and the equations relating Dp and & 
numerically and subsequently computes the observable scattering-type SNOM 
signal as a function of the tip-edge distance. 

To fit the experimental data with these simulations, we adopted the Drude 
model of the conductivity 
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where the Drude weight 
p=7=rinfzcosh{ cas eal (5) 
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and the scattering rate 7 are adjustable parameters. In equation (5), f is the Planck 
constant and ju is the chemical potential. Our best fits of the computed line profiles 
to the experimental data are shown in Extended Data Fig. 2b. The physical origin 
of y is discussed in Methods section ‘Plasmonic damping rate analysis. 

Gate dependence of the plasmon fringes. Extended Data Fig. 3 illustrates the gat- 
ing-induced carrier-density dependence of nanoscale infrared images at cryogenic 
temperature. These data are presented in a form of two-dimensional map of the 
near-field amplitude s as a function of the gate voltage and the distance L from the 
Au launcher. These data were acquired by repeated scanning along the same linear 
path for a set of increasing gate voltages. The direction of the scan was normal to 
the Au launcher. The data shown in Extended Data Fig. 3 encode information 
about the variation of \, and 7 with electrostatic doping. For example, the period 
of the plasmon fringes, which is equal to , is seen to monotonically increase with 
V,. The procedure used for the quantitative determination of A, and 7 is detailed 
in ‘Simulations of plasmons launched by the Au pad. 

The optical constants of hBN. The optical constants of hBN in infrared frequen- 
cies are governed by its optical phonon modes. To the best of our knowledge, 
there are no previously published data on the hBN phonon parameters at low 
temperatures. We have obtained this information by measuring the reflectance 
R(w) of exfoliated hBN micro-crystals on SiO2/Si substrates. Because of the lim- 
ited sample area, these experiments were performed using a diffraction-limited 
infrared microscope, which imposed a low-frequency cut-off for these far-field 
reflectance measurements. These experiments were done under high vacuum con- 
ditions. Extended Data Fig. 4 shows the optical image of the hBN flake under study. 
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The flake thickness is ~17 nm, which is close to the thickness of the bottom hBN 
layer in our hBN/graphene/hBN device structures. Extended Data Fig. 5 shows the 
R(w) value of hBN/SiO>/Si multilayer structures at different temperatures normal- 
ized to that of a gold reference. As one can see, R(w) is dominated by two resonance 
peaks, one at ~1,070cm™! due to SiO, phonons and the other at ~1,370cm~! due 
to hBN in-plane phonons. Extraction of the phonon parameters was done with the 
help of the RefFIT package*, which fits R(w) spectra using the multi-layer Fresnel 
reflection formulas, with every layer modelled with Lorentzian oscillators. The 
response of amorphous SiO; on a Si substrate is complex and, according to earlier 
studies, can be described with multiple Lorentzians as 
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where wy is the transverse optical phonon frequency of the phonon species and«,, 
is the high-frequency dielectric constant. The parameters wy, wp; and 4; for the 
room-temperature case are given in Extended Data Table 1. The corresponding 
fits to the reflectance data are shown by the red dashed lines in Extended Data 
Fig. 5. For the in- and out-of-plane permittivities of hBN we found it sufficient to 
use the single-Lorentzian model 
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Extended Data Fig. 6 shows the obtained linewidth +, and frequency uw, of the 
in-plane phonon. In particular, the linewidth is found to vary linearly with T, 
decreasing from ~9 cm! at room temperature to ~5cm at cryogenic tempera- 
tures. This temperature dependence has been included in our theoretical estimation 
of the dielectric losses (Methods section ‘Plasmonic damping rate analysis’). 
Unfortunately, sample thickness prohibits the evaluation of the out-of-plane per- 
mittivity of hBN from these R(w) results. Instead, we employed the Fourier- 
transform infrared nanospectroscopy (nano-FTIR) technique to determine €, 
(see Methods section ‘Plasmonic damping rate analysis’ and Extended Data Table 2). 
Simulations of plasmons launched by the Au pad. The electromagnetic solver 
described in ‘Gate dependence of the plasmon fringes’ gives an accurate representa- 
tion of the total near-field signal near the edge of graphene. However, it is more 
difficult to implement the same approach for a system that contains three conduc- 
tors, each of a nontrivial shape: a graphene micro-crystal, an elongated tip and a 
box-like Au pad. Fortunately, this is not necessary. For the purpose of extracting 
the plasmon wavelength , and the quality factor Q, we only need to consider the 
long-range plasmonic field launched by the pad. We assume that in the interior of 
the graphene flake, the tip acts as a passive detector and that the near-field signal 
that it registers is directly proportional to the normal component EF’ of the electric 
field underneath the tip. The first assumption is reasonable if the pad is a more 
efficient plasmon launcher than the tip. This is corroborated by the fact that the 
plasmonic fringes are dominated by simple Ap-period oscillations instead of ‘double 
peaks; as in Extended Data Fig. 2. The second assumption is not strictly true, but 
it is very common in the literature®?**. On the basis of these considerations, we 
included only the Au pad and graphene in our model. 

The simulation was performed using a publicly available software too 
that implements the boundary-element method to solve electrodynamic prob- 
lems involving nanoparticles. In our simulations the pad’s dimensions were 
1,000 nm x 500nm x 60nm. The last two values (the width and thickness) matched 
the physical ones. However, the first dimension (the length of the pad in the x direc- 
tion) was shorter than that of the actual device because we found it computationally 
prohibitive to simulate such a many-micrometre-long object on our computer (a 
desktop with a memory of 16 GB and a clock speed of 2.7 GHz). We believe that 
this reduction in the length of the pad in our model is still reasonable because the 
pad’s length controls mostly the antenna effect (that is, the overall magnitude of 
the field at its leading edge), whereas we want to reproduce the profile of the field 
distribution—not its absolute magnitude. We also rounded the pad’s corners using 
a curvature radius of 60 nm to obtain a smaller boundary-element method mesh 
(see Extended Data Fig. 7a) and speed up the computation. 

The permittivity of Au in our model was taken to be €4, = (1.0 + 1.0) x 10° 
but the results were insensitive to this parameter as long as it was large enough by 
absolute value. We modelled graphene as an infinite planar slab of thickness 
d,=0.1nm and three-dimensional permittivity €g=— coth(q_d,/2), which is 
related to the two-dimensional dielectric function introduced in Methods sec- 
tion ‘Plasmons near a graphene edge’ as € 3 2€(q, w) / (qd,). This relation ensures 
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that the slab supports propagating waves of momentum gqp. The results for the 
quantity of interest, which is the electric field just above the graphene, were practi- 
cally independent of the arbitrary parameter d, as long as d, was small enough. The 
hBN substrate was not explicitly included in the simulation, but its effect was 
accounted for implicitly by using the proper value of qp. The external field was taken 
to be uniform and linearly polarized in the y direction, Extended Data Fig. 7a. Using 
this model, we numerically calculated the field component E* within a 
4,000-nm? square region just above the graphene layer. An example of such a 
two-dimensional map is shown in the false-colour image of Extended Data Fig. 7b 
for the simulation parameters \, = 170nm and Q= 130, which are the ones that 
provide the best fit to the experimental data obtained at V, =75 V and T=60K. 
The qualitative features are similar to those seen in Fig. 2. Namely, the plasmon 
oscillations display a generic wave pattern that starts as a plane wave near the Au 
launcher, then turns into a Fresnel diffraction pattern further away and afterwards 
gradually transforms into a decaying cylindrical wave modulated by the Fraunhofer 
diffraction features; see Extended Data Fig. 7b. To avoid complications in the fitting 
procedure by these plasmonic diffraction effects and to suppress the influence of 
statistical noise in the experimental data, we averaged multiple line traces from both 
the simulation and the experimental two-dimensional maps over the strip 
x=—300nm to x= 300nm. This averaging gave us smooth line traces suitable for 
fitting. Our best fit for the above values of V, and T is illustrated in Extended Data 
Fig. 7c. The red trace is the data and the blue trace is the calculated, strip-averaged 
E’. The same fit can be also found in Fig. 2, together with those for the remaining 
data. To align the phases of the two curves in Extended Data Fig. 7c, we used a phase 
shift 6. However, this additional adjustable parameter does not affect our main goal 
of extracting Q, which we achieved by analysing the decay of the absolute value of 
E’, that is, the envelope of the oscillations. The fit in Extended Data Fig. 7c is clearly 
good, as shown in Fig. 2. Hence, in our subsequent analysis we relied on the deduced 
Q, as described in ‘Electron scattering by the acoustic phonons of graphene. 
Plasmonic damping rate analysis. Losses in the dielectric environment of grapheme. 
According to equations (2) and (3), the reciprocal of the plasmonic quality factor 
Qcan be split into two additive components 
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In this section we discuss the second term, which represents losses in the 
dielectric environment of graphene. To evaluate such losses that are inherent to 
our encapsulated structures, we need to find the effective permittivity 
K=k/(q,w) + &(q, w). This requires solving Maxwell's equations for the poten- 
tial distribution in our multilayer structure (Extended Data Fig. 8a), which is 
created in response to a periodically modulated charge density of ~e’” intro- 
duced in the graphene plane, where r is the two-dimensional coordinate on the 
graphene plane. In the quasi-static limit of large momenta, q>> w/c, the solution 
of this electrodynamic problem is straightforward, so we present only the 
final result 
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It is instructive to consider two limiting cases. If d), d,— 00, then e2kedj _, 0, 
so the effective permittivity k,n = ./€ ./€; of bulk hBN is obtained, as expected. 
In the opposite limit, di, d, — 0, we find « = (€, + €,)/2, that is, the effective 
permittivity is equal to the average of the permittivities of the semi-infinite top and 
bottom layers (vacuum and SiO), whereas the permittivity of hBN does not enter 
the calculation. Our device had d; =7 nm and d)= 14nm (Extended Data Fig. 8a). 
In this intermediate case, it is imperative to properly account for contributions 
from both hBN and SiO, losses. 

As mentioned in ‘The optical constants of hBN? the temperature dependence of 
the damping constant 1, of the hBN c-axis permittivity is not experimentally attain- 
able (where the c axis is the high-symmetry axis perpendicular to the a—b plane). 
Owing to the thermal occupation factors involved, both 7, and y, are expected to 
monotonically decrease with temperature. We thus assume that +, has the same 
temperature dependence as 7,, that is, 


7,(300 K) 


IAT) = 7,¢ 100K) 


(9) 
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To determine 7,(300 K), we performed nano-FTIR experiments for an hBN 
crystal taken away from the sample edges (Extended Data Fig. 8b) from which 
we obtained (300 K) = 3.4cm™! can be obtained via numerical modelling, in 
agreement with previous work*. For the plasmon wavelength \,=170nm, the 
final estimate of the environmental dielectric loss 


K!"(w) 
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calculated with the help of equations (7)-(9), is plotted in Extended Data Fig. 8c 
for y-(300 K) =3.4cm“!. Clearly, yeny(T) is well approximated by a linear function. 
Electron scattering by the acoustic phonons of grapheme. The scattering rate -y was 
defined in equation (4) as one of the two parameters (along with the Drude weight, 
D, in equation (5)) that define the graphene conductivity 0 = 0’ +0”. In this sec- 
tion, we discuss theoretical estimates of this scattering rate, which is generally 
temperature-dependent. In this discussion, the conductivity appears as a primary 
quantity and yas a derived one. In the case of interest, yw, it is given by 


fi 
Y(w) = re = 7 aw) (10) 
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In principle, the conductivity and the scattering rate depend on both the fre- 
quency, w, and the momentum, q. The characteristic scale of the momentum 
dependence is w/vs, where vz is the Fermi velocity. The plasmon momentum, 
4p </Vp, is much smaller. Therefore, to calculate the plasmonic losses it is suffi- 
cient to compute the zero-momentum scattering rate y(w) =7(q=0, w), which is 
an easier task. Noting that the power P absorbed per unit area of the system in the 
presence of an in-plane electric field Ee~' + E*e is equal to P=20'(w)|E|’, the 
desired scattering rate can be expressed as 
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We begin with the acoustic phonon contribution to 7, which we denote 7. The 
velocities of the longitudinal and transverse acoustic phonons in graphene, 
v= 2.2 x 10°cms ‘and v, = 1.4 x 10°cms_', are much smaller than the Fermi 
velocity, vp 1.0 x 10°cms '. This implies that the perturbations created by 
acoustic phonons are quasi-static for electrons. In conventional metals and semi- 
conductors, such perturbations are described in terms of the deformation potential. 
However, in graphene, it has become common to refer only to a part of the total 
perturbation as the deformation potential, namely, the part that is proportional to 
the diagonal component of the strain. The remaining part is referred to as the gauge 
field or the pseudo-magnetic field*®. The physical picture that goes with this ter- 
minology is that the deformation potential shifts the Dirac cones of graphene in 
energy, whereas the pseudo-magnetic field moves them in the momentum space, 
in opposite directions for the two valleys. The issue of the relative importance of 
deformation potential and pseudo-magnetic field remains unsettled. Although the 
deformation potential is absent*” or very small** in a naive tight-binding model, it 
is nevertheless often assumed to be the dominant contribution!*!9*9, However, 
recent first-principles calculations”*! found that the deformation potential is 
indeed negligible. We assume this to be the case and consider pseudo-magnetic 
field scattering only. This permits us to neglect electron—electron interactions 
because the pseudo-magnetic field, unlike the deformation potential, does not 
perturb the electron density. 

To find the phonon-induced correction to the power absorption P, we use the 
Fermi golden rule (see Extended Data Fig. 9) 
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Here A = cE/(iw) is the in-plane component of the vector potential; c is the 
speed of light, N = 4 is the spin-valley degeneracy; ¢= 1 (¢,=—1) represents the 
emission of a phonon (photon) and ¢= —1 (¢,= 1) represents the absorption of a 
phonon (photon); eg, = Shv,|k|is the energy of an electron with momentum k and 
band index S=+1; w,,= v,|q| is the frequency of the phonon of type v and 
momentum q; f(e) and ng(e) are the Fermi and Bose distribution functions, 
respectively; and k’ =k — q. The matrix elements WY _, ,= ww" si of the 
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electron-phonon coupling and the electron velocity y 
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where Ga = 5.0eV is the pseudo-magnetic field coupling constant, 
H,=7.6 x 10 ®gcm ® is the graphene mass density, s, is equal to 0 (1) for 
v=I(v=t) phonons and 6, is the polar angle of q, with the x axis being along a 
zigzag direction. 

Numerical evaluation of y,(w, T) using equations (11)-(13) for w = 886 cm 
and for the d.c. limit w—0 gives the results shown in Extended Data Fig. 10a by 
the green and blue lines, respectively. These lines are almost straight and nearly 
coincide with one another in the range of our experiments, T > 60K. For the d.c. 
scattering rate ya(w = 0, T), such a linear behaviour with respect to T is 
expected !8-20:37-39.41_ +, can be derived analytically by assuming | .| >> w, T and 
T > Tyq where Tyg = 2|/4| v;/Vz is the Bloch-Griineisen temperature. This set of 
inequalities permits one to neglect w,,, inside the $-function and to replace f(e) by 
a step-function and mp(e) by T/e. After some algebra, we find 
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The first equation in equation (14) is in agreement with prior work?03841, as is 


also the common observation that the linear temperature dependence sets in 
already at T + 0.2Ty, & (10 K)/n[10"? cm~*] = 25 K; see Extended Data Fig. 10. 
The second equation in equation (14) is our new result. Importantly, it explains 
why the infrared scattering rate ya(w, T) at the frequency of interest, 
w= 886 cm x 0.4| 4} is hardly different from the d.c. rate (0, T) in Extended 
Data Fig. 10a. 

The weak frequency dependence given by our formula is inconsistent with the 
rapid increase of ya(w, T) with w obtained in a previous theoretical work'* that 
studied electron-phonon scattering by the deformation potential mechanism. 
By examining the formulas given in ref. !*, we traced the difference between the 
deformation potential and pseudo-magnetic field cases to a strongly w-depend- 
ent electron screening. This screening influences the deformation potential in 
electron—phonon coupling but plays no role in the pseudo-magnetic field channel 
we consider here. 

At temperatures T < 0.2T,g the assumptions used in deriving equation (14) do 
not hold. Such temperatures have not been reached in our experiments, but we can 
discuss the predictions of our theory. In this ultimate low-temperature regime, the 
d.c. and alternating current (a.c.) scattering rates would no longer have the same 
scaling with temperature. The d.c. rate (Extended Data Fig. 10a, blue line) should 
vanish, whereas the a.c. one (Extended Data Fig. 10a, green line) should approach 
a non-zero value as T — 0. The physical reason for this difference is that the scat- 
tering rate 74(w, 0) at absolute zero is solely due to emission of acoustic phonons, 
which is possible only at a finite w. 

The red and black solid lines in Extended Data Fig. 10a represent the electron- 
phonon inter-valley scattering rate. This process, as well as the total electron- 
phonon scattering rates plotted in Extended Data Fig. 10b and c, are discussed in 
the next section. 

Inter-valley electron-phonon scattering. Inter-valley electron-phonon scattering 
is dominated by transverse optical phonons at the K-point of the Brillouin 
zone, the so-called A} phonons®. The matrix element involved in this scattering 
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where (3x is the electron-A{-phonon coupling constant. According to previous 
calculations?” of the corresponding d.c. scattering rate (0), Aj-phonon scat- 
tering is activated at T ~ 0.15fiw, 250 K, and at room temperature it is respon- 
sible for a large fraction of the total (0) © 7, (0) +,(0). Here we show that 
inter-valley scattering has an even larger contribution 7x(w) to the infrared scat- 
tering rate and that it begins at a temperature as low as 100K. This scattering 
mechanism can therefore naturally explain the observed super-linear temperature 
increase of the plasmon damping seen in Extended Data Fig. 10. 
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Within a naive tight-binding model?®?’, one finds 3x =43,/b=14eV A“, 
where b = 1.42 A is the carbon-carbon bond length. However, virtual electron— 
phonon scattering, such as electron screening and vertex corrections, are important 
for the inter-valley processes. These many-body effects cause a red shift of the 
Aj-phonon frequency wx (Kohn anomaly”) and an enhancement*!~* of the 
electron-phonon coupling constant (x. Both effects are difficult to compute 

accurately, and so we treat wx and /3x as adjustable parameters. 

To calculate 7x we use again the Fermi golden rule, equation (12), this time 
with the matrix element from equation (15). For high doping, || >> w, we can 
keep only the intraband terms S$” = S! = S, which are the dominant ones. After 
some algebra, we arrive at 
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This integral can be readily evaluated numerically**. At low temperatures the 
growth of 7,.(w) © exp[(wx—w)/T ]is roughly exponential because it is dictated by 
the thermal tails of the occupation factors of the phonons and electrons. This consid- 
eration makes it clear why the infrared scattering rate has an earlier onset in T than 
the d.c. one. In Extended Data Fig. 10c we show the results of our calculations for the 
total rate y = Ya+7x. The circles (squares) represent the plasmon damping rates 
extracted from the experiment, with the environmental loss rate chosen as in Extended 
Data Fig. 8c. For both d.c. and infrared cases, shown by the red and blue curves, 
respectively, good agreement between the theory and the experimental data is achieved 
for Bx=14eV A-! and wx = 1, 200cm |. Note that these values are basically the same 
as the previous first-principles calculations results”#", where G¢=13.9eV Aland 
wx = 1, 209cm™!. Other parameters that we used were j1=0.27 eV and 3, =5.0eV. 
Electron-electron scattering. In this section we estimate the plasmon damping due 
to electron-electron interactions using an original method. For previous work on 
this subject see, for example, ref. 8 In general, electron interactions can influence 
the conductivity only if the Galilean invariance of the system is broken“, which is 
a weak effect in doped graphene at low temperatures. 

We begin by noting that the electron-electron collision rate** 


(16) 


is comparable to the phonon-induced scattering rate; see Extended Data Figs. 10 
and 11. However, such collisions conserve the total momentum and thus 
approximately conserve the total current as well. In fact, there exists a nonequi- 
librium, current-carrying state that is immune to electron-electron collisions, 
namely, a Fermi sea with a Doppler-shifted energy spectrum eg, > €,,—hu*k*. 
This state corresponds to a uniform hydrodynamic flow with velocity u*. In 
comparison, the Fermi distribution of noninteracting electrons subject to an 
electric field E is typically shifted in momentum as fk* — hk*—Ap* with 
Ap* =eEy~ 1 Atlow temperatures, this latter state is similar, but not identical, 
to a hydrodynamic flow with velocity u* = Ap* /m, where m= hk,/vy is the 
effective mass and ky is the Fermi momentum. We can call these two types of 
perturbed distribution functions the velocity mode and the momentum mode, 
respectively. The Galilean invariance is indeed broken because m depends on 
the electron energy. However, this dependence is weak; therefore, we can antic- 
ipate that the damping rate entering the conductivity is much smaller than the 
electron collision rate I’... 

To proceed, we suppose that both momentum-conserving and nonconserving 
processes are present, with rates I", and I'y, respectively. As shown in recent work”, 
the form of the conductivity can be obtained by assuming that the perturbation 
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of the electron distribution function is a linear combination of the velocity and 
momentum modes. This ansatz leads to the two-term Drude formula 
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(see also ref. 4°) in lieu of the conventional single-term one, equation (4). We can 
still define the effective electron-electron scattering rate similarly to equation (10) 
to obtain 


Dis, 


ee? 


‘= nee) TX w> Iho Ty (17) 
Here D= ‘ie o'(w)dw is the total optical weight given by equation (4) and 
—0o 


D, = ne’n/m,, is the hydrodynamic Drude weight. For weak interactions, the 
corresponding hydrodynamic mass my is given by**“6 
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where @Q is the unit step function and Li,(z) is the polylogarithm function of order 
nand argument z. The crucial point is that at the temperatures of interest, T < |u|, 
the two Drude weights, D and Dj, are nearly equal. As a result, the plasmon damp- 
ing rate jee due to electron interactions is indeed much smaller than the electron 
collision rate I..; see equation (17) and Extended Data Fig. 11. 

Data availability. The data that support the findings of this study are available 
from the corresponding author on reasonable request. 
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Extended Data Fig. 1 | AFM topography and s(w) image of the hBN/ temperature s(w) images obtained at V,=0 V and V, = 50 V, respectively. 


graphene/hBN encapsulated device. a, AFM topography image. The G, graphene. 
black dashed line marks the physical edge of graphene. b, c, Room- 
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Extended Data Fig. 2 | Plasmonic fringes near a graphene edge at 
different temperatures. a, Maps of the near-field amplitude s measured at 
several temperatures for fixed V,;=75 V and w=886cm |. b, Line profiles 
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obtained by averaging over the vertical coordinates in each map. The solid 
lines are the data and the dash-dotted lines are the simulations. 
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Extended Data Fig. 3 | Gate dependence of plasmon propagation at w= 886cm!.b, Illustration of the gate sweeping sequence. c, Line profiles 
cryogenic temperature. a, Near-field amplitude s as a function of the gate (averaged over ~200 nm perpendicular to the propagation direction) of 
voltage and the distance L from the Au launcher at a fixed frequency of plasmonic interference fringes at different gate voltages. 
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Extended Data Fig. 4 | Optical image of an hBN sample on a SiO,/Si substrate. The dashed red square marks the approximate location of the region 
used in the reflectance measurements. The hBN thickness is 17 nm. 
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Extended Data Fig. 5 | Reflectance spectra of the hBN/SiO./Si structure _ to the in-plane optical phonon of hBN, and the broad peak at around 
at T=60-300 K. a-f, The black points are the experimental data and the 1,070 cm! is due to the optical phonon of SiO>. 
red dashed lines are the fits. The sharp resonance at 1,370 cm! is due 
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Extended Data Fig. 6 | The hBN in-plane phonon parameters versus the temperature. a, The fitted phonon linewidth y,. The squares are the data and 
the dashed line is a guide for the eye. b, The fitted frequency u,. 
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Extended Data Fig. 7 | Plasmonic oscillations in the vicinity of the Au 
launching pads. a, Illustration of the Au pad (gold, with the triangular 
mesh used in the simulation) on graphene (honeycomb lattice). The 

red arrow depicts the direction of the external field. The blue arrow 
symbolizes launched plasmons. Scale bar, 200 nm. b, An example of the 
simulated electric field distribution (E’, imaginary part) just above the 


graphene layer. The grey rectangle represents the right half of the Au pad. 
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The simulation parameters are \, = 170nm and Q= 130. c, Comparison 
of the theoretical fit (blue) and the experimental data (red) for Vs =75 V 
and T=60K. Both the theoretical and experimental traces are obtained 
by averaging multiple line cuts inside the 600-nm-wide strip indicated by 
the white dashed lines in b. The phase shift 6 = 112° (see text) was used to 
align the oscillations. 
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Yeny Of the dielectric environment to the plasmon linewidth as a function 
of temperature at a frequency of w= 886cm_!. The hBN c-axis damping 
constant is y,(300 K) = 3.4cm |, which is consistent with previous 


Extended Data Fig. 8 | Device structure and dielectric losses. 

a, Schematic of the device, showing the notations for the permittivities 
and thicknesses of the layers. b, Nano-FTIR spectrum s(w) obtained with 
the hBN crystal taken away from the sample edges. c, The contribution 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


Ss 6 k 3 
Extended Data Fig. 9 | Electron-phonon scattering processes. Diagrams of the scattering processes included in equation (12). The wavy, straight and 
dashed lines represent photons, electrons and phonons, respectively. 
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Extended Data Fig. 10 | Electron-phonon scattering rate as a function 
of temperature. a—c, Temperature dependence of the plasmonic scattering 
rate and of the d.c. scattering rate. Solid lines in a and b display the results 
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of parameter-free modelling for electron-phonon scattering contributions 
(a) and electron-electron scattering contributions (b). Solid lines in ¢ 
display the results of the sum in a and b, as discussed in the main text. 
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Extended Data Fig. 11 | Plasmon damping rate due to electron-electron —_ The red solid curve is the electron collision rate I’. from equation (16). 
scattering as a function of temperature. The blue solid curve is the plasmon The squares and circles are I’.. values extracted from recent d.c. transport 
damping rate 7. due to electron-electron interactions, computed from studies”>4 at a different carrier density, n ~10'? cm’. 

equations (17) and (18) for a Fermi energy of ¢; = u(T = 0) = 0.27eV. 
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Extended Data Table 1 | Phonon oscillator parameters for SiOz 


04 (cm") (em) ¥,(em") 
1108.5 179.8 1331 
1104 219.7 12.6 
1098 203.4 12.76 
1092 330.8 19.6 
1086 392.4 19 
1077 355 13.76 
1067 344 13.2 
1059 386.9 16 
1052 221 9.7 
1045 302.8 13.2 
1035 237.7 13 
1030 37.68 16.9 


The fitting parameters of equation (6) for T=300K. 
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Extended Data Table 2 | Phonon oscillator parameters for hBN 
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The single-Lorentzian fitting parameters of equation (6) for T= 300K. See also Extended Data Figs. 6, 8. 
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Addressing the minimum fleet problem in 
on-demand urban mobility 


M. M. Vazifeh!*, P. Santi!, G. Resta’, S. H. Strogatz? & C. Ratti! 


Information and communication technologies have opened 
the way to new solutions for urban mobility that provide better 
ways to match individuals with on-demand vehicles. However, a 
fundamental unsolved problem is how best to size and operate a fleet 
of vehicles, given a certain demand for personal mobility. Previous 
studies!~> either do not provide a scalable solution or require 
changes in human attitudes towards mobility. Here we provide a 
network-based solution to the following ‘minimum fleet problem, 
given a collection of trips (specified by origin, destination and start 
time), of how to determine the minimum number of vehicles needed 
to serve all the trips without incurring any delay to the passengers. 
By introducing the notion of a ‘vehicle-sharing network, we present 
an optimal computationally efficient solution to the problem, as well 
as a nearly optimal solution amenable to real-time implementation. 
We test both solutions on a dataset of 150 million taxi trips taken in 
the city of New York over one year®. The real-time implementation 
of the method with near-optimal service levels allows a 30 per cent 
reduction in fleet size compared to current taxi operation. Although 
constraints on driver availability and the existence of abnormal trip 
demands may lead to a relatively larger optimal value for the fleet 
size than that predicted here, the fleet size remains robust for a 
wide range of variations in historical trip demand. These predicted 
reductions in fleet size follow directly from a reorganization of 
taxi dispatching that could be implemented with a simple urban 
app; they do not assume ride sharing’~’, nor require changes to 
regulations, business models, or human attitudes towards mobility 
to become effective. Our results could become even more relevant 
in the years ahead as fleets of networked, self-driving cars become 
commonplace!*"14, 

Two trends—the rise of the autonomous and connected car, and the 
emergence of a ‘sharing economy’’”"! of transportation—seem poised 
to revolutionize the way personal mobility needs will be addressed in 
cities. The way current modes of transportation such as the private 
car, taxi or bus operate will be challenged and increasingly replaced by 
personalized, on-demand mobility systems operated by vehicle fleets, 
similar to what companies like Uber and Lyft offer. If such trends con- 
tinue, they could lead to the displacement, or eventual disappearance, 
of jobs for bus and taxi drivers. Along with these possible societal costs, 
the transportation revolution could also offer immense benefits, includ- 
ing opportunities to resolve existing inefficiencies in individual urban 
mobility'?"4, thereby reducing traffic, whose carbon footprint cur- 
rently accounts for about 23% of global greenhouse gas emissions!>"'®. 

To turn these opportunities into tangible environmental and soci- 
etal benefits, autonomous and on-demand mobility systems need to 
be designed and optimized for efficiency, and integrated with carbon- 
efficient public transport. This requires the definition of models and 
algorithms for the evaluation of shared mobility systems that are both 
computationally efficient and accurate. The former property is man- 
dated by the need to cope with hundreds of thousands (or sometimes 
millions) of trips routinely occurring in a large city. The latter property 
determines the relevance of the model results to the real world. 


In what follows, we solve the ‘minimum fleet problem for the general 
case of on-demand mobility, and show that its solution for a specific 
case—taxi trips—could lead to breakthroughs in operational efficiency. 
To the best of our knowledge, no publicly available solution currently 
exists to address this minimum fleet-size problem at the urban scale 
for on-demand mobility in both private and public sectors. On the one 
hand, accurate methods based on mathematical programming (as tra- 
ditionally used in the design of transportation systems'~>”) can handle 
only a few thousand trips or vehicles at most, which is well below the 
hundreds of thousands or even millions of trips or vehicles routinely 
operating in large cities. On the other hand, city-scale studies’” are 
obtained using a model of transportation based on aggregated mobility 
data and Euclidean spatial assumptions, and hence lack the resolu- 
tion necessary to estimate the urban-scale benefits of vehicle sharing 
accurately. 

We start from the notion of the shareability network introduced in 
ref. ’, which did not focus on the dispatching of vehicles. The type of 
shareability network introduced here is profoundly different from the 
type studied previously: it models the sharing of vehicles, whereas pre- 
vious networks’~° modelled the sharing of rides. The main methodo- 
logical contribution of this Letter is to show how this vehicle-sharing 
network can be translated into an exact formulation of the minimum 
fleet problem as a minimum path cover problem on directed graphs, 
thus establishing a connection to the rich applied mathematics and 
computer science field of graph algorithms. Besides revealing a struc- 
tural property of vehicle-sharing networks, this connection allows the 
derivation of computationally efficient algorithms for optimal vehicle 
deployment and dispatching. Although optimally solving the mini- 
mum fleet size problem requires offline knowledge of daily mobility 
demand, in the following we also present a near-optimal, online version 
of the algorithm that can be executed in real time knowing only a small 
amount of the trip demand. 

Weare given a collection T of individual trips representing a portion 
of urban mobility demand during a certain time interval, such as a day. 
Each trip T,€T is defined asa tuple (t?, t, IP, 14) where t? repre- 
sents the desired pick-up time, ? the pick-up location, t* the drop-off 
time, and / : the drop-off location. Here, the pick-up time means the 
earliest time t? at which the passenger can be picked up at location I?. 
The drop-off time means the estimated time of dropping off the pas- 
senger, calculated using a travel-time estimation model and assuming 
the passenger leaves the pick-up location at time t?. In contrast to ref. 17, 
travel times here are computed using the actual road network, and 
using global positioning system (GPS)-based estimations derived from 
the taxi trip dataset that account for hourly variations in traffic, as in 
ref. ’. If the set J is extracted from a real-world dataset (for example, 
taxi trips), the times ¢? and t/ represent the actual times at which a 
passenger is picked up and dropped off, respectively. 

The minimum fleet problem is formally defined as follows: ‘find the 
minimum number of vehicles needed to serve all trips in T, given that 
a vehicle is available at each /? on or before t?. A service designed 
around this problem is ideal from a passenger’s perspective, since a 
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Fig. 1 | Constructing the vehicle-shareability network. a, Several trip 
requests are depicted on the map index as T4,..., Tp. The coloured path on 
the map represents different possibilities for vehicle dispatching. 

b, Colours are used in the vehicle-shareability network to specify how 
various dispatches can be represented using paths on this network. One of 
the two dispatching scenarios required only two vehicles whereas the other 
required three. c, The optimal vehicle-dispatching routes are represented 
on the map. d, To construct the vehicle-shareability network, trip set T 
and the travel times are taken as inputs. Two trips are connected by a 


vehicle is guaranteed to be available at the desired location and time. 
On the other hand, the above problem formulation might entail sub- 
stantial inefficiencies for the operator and the environment. Consider 
two consecutive trips T, and Tg served by a single vehicle, and call the 
time needed to connect them the (trip) connection time, formally 
tap=th— t¢. If this time is very long, say, a few hours, it is trivially pos- 
sible to connect trips that occur at distant locations or times. Hence, an 
excessively large connection time leads to inefficiencies for the operator 
(longer travelled distances, lower vehicle occupancy ratio) and the cit- 
izens (a lot of emissions and traffic just to connect trips). We therefore 
re-formulate the problem as follows: ‘find the minimum number of 
vehicles needed to serve all trips in 7, under the assumptions that (1) 
a vehicle is available at each I? on or before t? and (2) the connection 
time is at most 6 minutes, where the upper bound 6 on the connection 
time is a problem parameter. 

Figure 1 illustrates the construction of the vehicle-shareability net- 
work that enables the minimum fleet problem to be optimally solved 
with parameter 6. This is a directed network defined as V=(N, E), 
where node n; € N coresponts to trip T;€ T and the directed edge (nj, 
nj) € E if and onl ‘pa ly if ( is +t;)< tP (which accounts for assumption (1) 
above) and t?— t; 6 (whiel accounts for assumption (2) above). Here, 
tij represents the Ee travel time between /{ and /?. The existence 
of a link in the network indicates that the two incident trips can be 
consecutively served by a single vehicle, and a path in V corresponds 
to a sequence of trips that can be served by a single vehicle—that is, a 
dispatch. Therefore, solving the minimum fleet problem is equivalent 
to finding the number of paths (vehicles) in the minimum path cover 
of V. The solution also gives the optimal dispatching strategy, that is, a 
sequence of trips to be served for each vehicle in the minimum fleet. 
The problem of finding the minimum path cover on general graphs is 
NP-hard, but it can be solved efficiently on directed acyclic graphs’® 
using the Hopcroft-Karp algorithm for bipartite matching’. The 
acyclic nature of time guarantees that any vehicle shareability network 
is a directed acyclic graph, and the minimum fleet problem can be 
efficiently and optimally solved; see Methods for formal proofs. 


Vehicle-shareability network 


LETTER 


Optimal path cover 


directed edge if there is a large enough gap in time between the drop-off of 
the first and the pick-up point of the next trip to allow a single vehicle to 
travel between the two points before the pick-up time of the second trip 
starts, according to the travel-time information. Furthermore, the upper 
bound 6 on the trip connection time must not be exceeded. The 
path-covering algorithm yields the path set that covers the entire node set, 
ensuring that all trips are served, while minimizing the number of paths 
(vehicles) in the solution. This is the optimal solution to the minimum 
fleet problem with parameter 6. 


We have tested our methodology on a dataset of over 150 million taxi 
trips performed in the city of New York in the year 2011. This dataset 
has been selected from a number of available datasets® because it is 
publicly available and, thanks to taxi statistics published by the New 
York Taxi and Limousine Commission‘, it is possible to compare our 
methodology directly with current taxi operation. The data have been 
sliced into daily datasets T;, each of which is an input to the minimum 
fleet size problem. 

Next, we discuss how to set the parameter 6. When 6 is decreased to 
0, we approach a situation in which each trip is served by a dedicated 
vehicle: a solution with maximal vehicle utilization that is also optimal 
for traffic—under the assumption that vehicles materialize at the origin 
and dematerialize at the destination of the served trip—but incurring 
prohibitive costs for the mobility operator. On the other hand, when 
6 grows excessively, the fleet size is reduced, but the operational and 
traffic efficiency problems described previously occur. Thus, the setting 
of 6 is an important design choice that is left to mobility operators, 
traffic authorities and policy makers. In this study, we set 6= 15 min, as 
explained in Methods. The results of our method with different values 
of 6 are reported in Methods (see Extended Data Fig. 1). 

Figure 2 shows the daily number of vehicles needed to address the 
entire taxi demand in New York City using our approach. The mini- 
mum number of vehicles needed to serve trips is correlated with the 
number of daily trips (see Fig. 2a), with an overall R? value of 0.74. 
However, for the vast majority of days having between 300,000 and 
550,000 trips (inset to Fig. 2a) this correlation becomes much weaker, 
with an R? value of only 0.18. Thus, trip density is a first determinant of 
fleet size, but trip spatiotemporal patterns are likely to play a large part 
as well. To investigate this issue further, we have analysed daily vehicle 
usage in the optimal solution. 

The vehicle usage analysis reported in Methods shows that a fraction 
of vehicles, ranging between 5% and 10%, are highly underutilized and 
serve only around 1% of the trips, a lower utilization pattern that occurs 
especially during the weekend and is probably related to the extra 
nightly demand. The analysis also highlighted clear weekly patterns in 
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Fig. 2 | Minimum fleet-size analysis. a, The interplay between the 
number of daily trips and the minimum number of vehicles required to 
serve them. The colours of the dots correspond to different weekdays. 
Clustering of points with the same colour suggests weekly patterns, 

which are confirmed by the yearly analysis reported in Methods (see 
Extended Data Fig. 2). The plot shows a moderate correlation between the 
two quantities. However, when focusing on days with a number of trips 
ranging from 350,000 to 550,000 (inset), the correlation becomes much 


vehicle use, consistent with the relatively stable vehicle fleet size across 
the year. This observed stability can be explained by a simple model for 
vehicle trip assignment, and is fundamental for mobility operators: it 
indicates that investment in acquiring an optimal number of vehicles 
for operation gives consistent yearly returns. The dip in vehicle fleet size 
occurring at weekends hints also at an opportunity to perform routine 
vehicle maintenance on a weekly basis. 

A better scaling law relating vehicle fleet size to the daily number 
of trips can be obtained by defining a metric for fleet sizing that 
incorporates how long a vehicle is used during a day. We define a 
‘full-time equivalent’ vehicle as a vehicle continuously operating 24h 
a day. (In the case of human-driven vehicles, we can think of hav- 
ing the vehicle operated in three 8-h shifts, for instance.) Figure 2b 
shows that the scaling law relating the number of daily trips with 
full-time equivalent vehicles is more accurate than the previous one, 
with the coefficient of determination R? value increased from 0.74 to 
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weaker, as the fleet size remains within a narrow window of around 6,000 
vehicles. b, The correlation between the number of daily trips and the total 
fleet operation hours. The latter is defined as the total time of operation, 
estimated by summing over the operation times for each vehicle in the 
minimum fleet obtained over a day. The operation time for each vehicle 

is defined as the total time a vehicle is operating on the road to serve the 
trips. The total fleet operation hours have a much stronger correlation with 
the number of trips than with the number of vehicles. 


0.91, and from 0.18 to 0.70 for the trip-intense days reported in the 
inset. 

Figure 3 shows the efficiency breakthrough provided by network- 
based optimization: when compared to current taxi operation in New 
York City, the number of circulating taxis can be reduced by an impres- 
sive 40%, and kept fairly constant through the day. This improvement is 
all the more noticeable considering that it is achieved without imposing 
any delay on customers, nor sharing of rides as in refs. ”°. That fleet 
size can be reduced by as much as 40% without the use of ride sharing 
and with no delay for passengers has, to the best of our knowledge, not 
been reported in the literature before, and it is one of the main results 
of this paper. 

The 40% fleet reduction reported above refers to the model with full 
knowledge of daily trip demand. If only a portion of trip demand is 
known, as in current on-demand mobility services where trip requests 
are collected in real time, we can still achieve near-optimal performance 
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Fig. 3 | Fleet efficiency comparison. Comparison between actual number 
of New York City (NYC) taxis on the road® (black curve), versus the 
minimum number of vehicles as computed by our optimal approach (red 
curve). For the optimized approach, a breakdown of vehicles into ‘with 
passenger, ‘waiting to pick up’ and ‘driving to pick up’ is also reported. The 
curves are computed by averaging data across one year. As shown by the 
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dotted lines corresponding to average deployment, optimized dispatching 
brings a reduction in number of circulating taxis from 7,748 down to 
4,627, a 40% reduction. With online operation ready to implement using 
a smartphone app, our method provides a near-optimal 30% reduction in 
the number of circulating taxis (see Fig. 4). 
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Fig. 4 | Performance of the network-based online vehicle-dispatching 
model. a, Two-panel plot comparing the percentage of trips served 
within a certain maximum delay max(A?) based on the batch-based 
optimized dispatching (blue line) versus sequential dispatching, as 
described in the Methods. The panel at the top represents the results 

for max(At) = 3 min and the one at the bottom is for max(At) = 6 min. 
The use of network-based dispatching (blue line) provides a substantial 
improvement in the percentage of successfully served trips with respect 
to sequential dispatching (orange line). The fleet size is set to N= Nminx; 
where Nmin is the average minimum fleet size obtained from running the 


with the online version of the algorithm reported in the Methods. This 
version collects trip requests for a short time, for example, one minute, 
and locally optimizes vehicle dispatching based on this limited knowl- 
edge. Figure 4 shows that, with a 30% fleet reduction and using the 
online version of the algorithm, more than 90% of the trip requests can 
be successfully served, hitting a performance very close to the 40% fleet 
reduction possible when the entire daily demand is known beforehand. 

Our approach assumes that trip requests and vehicle-dispatching 
decisions are centralized, a model that is radically different from 
current taxi operation and similar to the one used by online mobility 
operators. Therefore, the benefits of optimized operation reported in 
Fig. 3 can be interpreted as being implied by the transition from a fully 
distributed operation, where the deployment strategy is based on indi- 
vidual driver decisions, to a centralized operation, where dispatching 
decisions are globally optimized. To some extent, our results can then 
be seen as a quantification of the well known game-theory notion 
of the ‘price of anarchy” in urban taxi operation. Taking a mobility 
market perspective, this is a transition from a regulated mobility mar- 
ket with numerous micro-operators (down to the level of the single taxi 
driver), to a monopolistic market with a single mobility operator with 
centralized operation. Although optimal from the vehicle operation 


algorithm with daily trip knowledge based on the historical data for the 

15 days considered in the evaluation. The fleet size inflation ratio x is set 
to 1.2, the maximum passenger waiting time is set to max(At) = 3 min or 
6 min, and the batching time in batch-based dispatching is set to 1 min. 

b, The average percentage of trips successfully served within a 6-min delay 
is above 90% for a fleet of 6,200 vehicles. Similar performance can only 

be achieved with sequential dispatching when the fleet size is more than 
10,000. c, Plots showing daily averages of the total percentage of trips in a 
day successfully served within the specified delay for the same days as in a. 


and environmental viewpoint, a monopolistic market is however highly 
undesirable for many other reasons, most importantly, lack of com- 
petition with consequent higher prices for customers. An additional 
analysis reported in Methods shows that most of the efficiency benefits 
of centralized vehicle operation are still possible in an oligopolistic market. 

Although the characterization of minimum fleet size reported here 
is fully representative of an autonomous driving scenario where human 
operation of vehicles is not necessary, constraints on driver availability and 
maximum operating hours, shift operation and so on might produce rela- 
tively larger values of the minimum fleet requirement than those predicted 
here. Extending the concept of the vehicle-sharing network to incorporate 
driver constraints is possible and is left for further analysis. 

Broader effects on traffic are foreseen if our methodology is to be used 
for optimizing urban ‘on-demand’ mobility services more in general, 
especially in a future of autonomous vehicles. However, it is well known 
that an improvement in mobility efficiency is sometimes linked with an 
increase in demand which, in turn, could reduce the amount of traffic 
reductions. Evaluating this ‘second-order’ effect of optimized fleet oper- 
ation on urban traffic requires coupling a micro-level traffic simulation, 
agent-based passenger models and our network-based methodology, a 
challenging task which we leave to future work. 
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Finally, we observe that, while applied here to taxi trips as a case 
study, the proposed methodology for optimal vehicle fleet sizing and 
dispatching is general and can be applied to model any type of point-to- 
point mobility. However, the approach presented here focuses on opti- 
mizing and dispatching a single fleet of vehicles. Optimization across 
different fleets and transportation modes is possible by extending 
our approach to consider multiple coexisting fleets of various types 
to serve the mobility demand. With the approaching advent of auton- 
omous mobility and the forecast increase in sharing cars (or other 
autonomous vehicles, such as flying drones), the problem of how to 
optimize and orchestrate multiple autonomous fleets will come to 
the forefront, and might be addressed using the scalable and accurate 
analytical tools presented here for optimal solution of the ‘minimum 
fleet’ problem. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0095-1. 
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METHODS 


Trip data. The dataset used in this work consists of more than 150 million trips 
with passengers of all 13,586 taxicabs in New York City during the calendar year 
of 2011. The dataset contains a number of fields from which we use the following: 
origin time, origin longitude, origin latitude, destination longitude and destination 
latitude. The measurement precision of times is in seconds; location information 
has been collected by the data provider via GPS location tracking technology. 
Out of our control are possible biases due to urban canyons (that is, streets with 
high-rise buildings on both sides), which might have slightly distorted the GPS 
locations during the collection process. All individual-level identifications are 
given in anonymized form; origin and destination values refer to the origins and 
destinations of trips, respectively. 

Map matching. Similar to the preprocessing done in ref. ”, we used data from www. 
openstreetmap.org to create the street network of Manhattan. As described in pre- 
vious work’, we used a filtering method on the streets of Manhattan to select only 
the following road classes: primary, secondary, tertiary, residential, unclassified, 
road and living street. We left out several other classes deliberately. These include 
footpaths, trunks, links or service roads, as they are unlikely to contain delivery or 
pick-up locations. We extracted the street intersections to build a network in which 
nodes are intersections and directed links are roads connecting those intersections 
(we use directed links because a non-negligible fraction of streets in Manhattan are 
one-way). The extracted network of street intersections was then manually cleaned 
for obvious inconsistencies or redundancies (such as duplicate intersection points 
at the same geographic positions), in the end containing 4,091 nodes and 9,452 
directed links. This network was used to map match the GPS locations from the 
trip dataset. We matched only GPS locations (both trip origins and destinations) 
that are within 100 m of at least one node in the street intersection network, which 
is the case for the majority of trips, and discarded the remaining ones. After the 
preprocessing and filtering phase, more than 147 million trips remain to be used 
in the next phases of our analysis. 

Travel-time computation. Travel-time information is a key part of building vehi- 
cle-shareability networks. The knowledge of estimated travel times is based on 
a heuristic method developed and used in ref. ’. This method uses pick-up and 
drop-off times of a historical trip dataset and computes the travel times between 
arbitrary origins and destinations on the road map. 

In the following we briefly describe the core idea of this method. A detailed 
description can be found in the supplementary information of ref. 7. 

Each street segment belongs to the set, S= {Sj,..., Sp}, of all road segments 
connecting any pair of adjacent intersections in the road map. Given a set of k 
historical trips T= {T,, ..., T,}, the problem of travel-time computation is estimat- 
ing the travel time t; for each street segment S; € S in such a way that the average 
relative error (computed across all trips) between the actual travel time t; and the 
estimated travel time t; for trip T; computed starting from the x; (compound with 
a routeing algorithm) is minimized. Once error-minimizing travel times for each 
street in S are determined, the travel time between any two intersections i and j can 
be computed starting from the f; values, using a routeing algorithm that minimizes 
the travel time between any two intersections. 

The following steps are involved in the process of travel-time computation. 

First, we partition the trip set in time-sliced subsets J;, ..., Z,4 where subset 7; 
contains all trips whose starting time is in hour i of the day. If desired, finer par- 
titioning (for example, per hour and weekday, per hour and weekday and month, 
and so on) is possible. The travel-time estimation process can be performed inde- 
pendently on each of the time-sliced trip subsets. We define T ;/ as the subset of 
trips with origin x, and destination x, in which x, contains the (latitude, longitude) 
coordinates of the sth intersection after the trips are matched. A small fraction of 
trips are filtered to remove ‘loop trips (that is, trips with the same origin and 
destination), as well as excessively short or long trips. After a step in which initial 
routes are computed using a pre-selected initial speed vint (the same for all streets) 
as described in ref. ’, a second trip-filtering step is performed, in which excessively 
fast and slow trips are removed from the travel-time estimation process. 97% of 
trips remain after this filtering. The travel-time estimations obtained using this 
method are reasonable, with a relatively lower average speed of around 5.5ms_! 
estimated during rush hours (between 8 am and 3 pm), and peaks at around 
8.5ms_! at midnight. 
Node-disjoint path cover. In the following we provide a set of definitions and 
present relevant theorems with their proofs to systematically formulate the prob- 
lem of reducing the fleet size as a path-covering problem on a vehicle-shareability 
network. 

Given a directed network V=(N, E), a path P in V is a sequence of edges 
{e, = (nj, n/), ...5e,= (ngs mg)} € E such that n? = nj,,,, for each i= 1, ..., kK-1. 
The set of nodes in path P is defined as N(P) = U;_ 1 {nj}. The length of a path 
Pis the number of edges k that form it. 

Definition 1 (Path cover). Given a directed network V = (N, E), a node-disjoint 
path cover of V is a collection of paths {P}, ..., P,} such that Uj; ,N(P;) = N and 
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N(Pi)ON(P;) = @ for any i~j. The size of the cover is the number of paths h of 
which it is formed. 

We note that, under the conventional assumption that zero-length paths cor- 
responding to single nodes are allowed, a node-disjoint path cover always exists. 
In the following, to simplify presentation we drop the term ‘node-disjoint’ and 
use ‘path cover’ to refer to a ‘node-disjoint path cover’ as defined in ‘Definition 1 
(Path cover)’ above. 

Theorem 1. Let C= {P,, ..., P,} be a path cover of the vehicle-shareability net- 
work V=(N, E). Then, all the trips in J can be served by h vehicles. 

Proof. Consider a path P= {e,=(n},n,), ...5 = (nj, mg)} in the vehi- 
cle-shareability network V. By definition of shareability network, the trips corre- 
sponding to n;| and n; (call them T; and T;) can be served by a single vehicle. 
Furthermore, the vehicle performing trip T; is guaranteed to arrive at the pick-up 
location of T; within time t}; that is, vehicle sharing does not impose any delay on 
the starting time of the second trip. Also, the upper bound 6 on the trip connection 
time is not violated by the definition of shareability network. Hence, the vehicle 
that serves T; and T> can be used to serve trip T; corresponding to node ny in V, 
since the starting time of trip T> is not changed as a result of sharing, implying that 
the condition ensuring shareability of T3 and T is still fulfilled. By iterating the 
argument across all nodes in N(P), we can conclude that all trips whose corre- 
sponding nodes are in N(P) can be served by a single vehicle. Thus, ifa path cover 
of size h exists, we can conclude that all trips in T can be served by h vehicles. 

Corollary 1. The minimum number of vehicles needed to serve the trips in T 
equals the size of the minimum path cover of the vehicle shareability network 
V=(N, E). 

Finding the size of the minimum path cover of an arbitrary directed network 
is NP-hard'® and hence computationally infeasible for large graphs. However, the 
optimal solution can be found in polynomial time if the network is acyclic, mean- 
ing that there is no directed path in the network forming a closed loop. 
Definition 2 (Directed acyclic network). A directed network V=(N, E) is acyclic if it 
has no directed cycles, that is, it does not contains directed paths starting at some 
vertex n € Nand eventually returning to n again. 

Any vehicle-shareability network as defined above is a directed acyclic network. 
To see how the acyclic character arises one can use proof by contradiction. Assume 
a cyclic path exists in V. For simplicity, assume the path has minimal length of 2. Let 
P={(nj, n2),(n2, 1)} be a cyclic path, and let T, and T> be the trips corresponding 
to m, and mp, respectively. By the definition of vehicle-shareability network, we have 
the following sequence of inequalities: 


d d d d 
t<thtt,.<#<t<t+t,<t 


which is a contradiction since t > tP. Hence, no cyclic path of length 2 can exist 
in V. The proof follows by straightforwardly extending the above sequence of 
inequalities to cyclic paths of arbitrary length. This implies that the minimum 
number of vehicles needed to perform a set T of trips can be computed in poly- 
nomial time. More specifically, it is shown that for directed acyclic networks the 
problem of finding the path cover of minimum size is equivalent to the well known 
maximum matching problem on bipartite graphs, which can be solved in time 
O(|E|(|N])") using the Hopcroft-Karp algorithm. 

Online model. The results shown so far compute the minimum infrastructure on 
the basis of the knowledge of the entire shareability network for the day considered. 
This is analogous to the Oracle model as defined in ref. ’, and is consistent with a 
scenario in which trip requests are issued in advance (for example, through a res- 
ervation system). To investigate to what extent the above described benefits extend 
to systems where trip requests are issued in real time (such as Uber and Lyft), we 
repeat the analysis in the so-called online model. In the online model, we have a 
number of vehicles available for serving trips, which is defined as N= Nminx, where 
Nmin is the minimum fleet size for the day of reference as computed by the oracle 
model, and x > 1 is an inflating factor. We then start serving trip requests with the 
available vehicles, whose initial position is determined through a warm-up phase in 
which a number of trip requests from the previous day (not accounted to compute 
the results) are served. To compare online models, two possible strategies are used 
to dispatch vehicles and serve trip requests, as follows. 

On-the-fly. Trip requests are served sequentially; when a new trip request is issued, 
the dispatched vehicle is chosen as the first available vehicle that minimizes pas- 
senger waiting time. 

Batch. Trip requests are collected for time 6= 1 min and processed in batches. 
When a batch is processed, a maximum matching is computed to maximize the 
number of requests that can be successfully served (that is, served within max- 
(At) =6 min); vehicles are then dispatched on the basis of the result of the maxi- 
mum matching algorithm, as explained in the following. At each given minute the 
trip requests information and the locations of the available vehicles are compiled 
to construct a weighted bipartite graph. The edge weight on a pair of vehicle-trip 
node represents the pick-up delay a passenger associated with the trip node would 
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experience in case the vehicle associated with the vehicle node is chosen to serve 
the passenger. After constructing this weighted bipartite network, the maximum 
matching algorithm can be used to find a subset of edges covering the maximum 
number of trip nodes served within the tolerable delay, max(At?). 

Figure 4 shows the success rate of the two dispatching algorithms for a period 
of 15 consecutive days, for x= 1.2, serving the trips within a certain tolerable delay. 
As seen in this figure, the batch method (blue lines) provides a success rate which is 
consistently above 92%, and is much higher than what is achieved by the sequential 
on-the-fly method for max(At) =6 min. As reported in Extended Data Table 1, 
the running times of the online version of the method are below 200 ms in the 
worst-case scenarios on a standard Linux machine, indicating the feasibility of the 
proposed approach for real-time optimization. 

The warm-up phase used in the above-mentioned online optimizations consists 
of first deploying each vehicle at a random intersection, then running the batch 
optimization algorithm as described above on the 2h of historical trip requests 
that precede the period of interest. The shaded regions in Fig. 4a and c and in 
Extended Data Fig. 3 represent the variation in the percentage of the trip served as 
obtained by running the real-time optimization for each day multiple times, each 
time reinitializing the warm-up phase with a distinct random initial deployment 
of the fleet. The variations are quite small, showing that within 2h the system’s 
spatiotemporal distribution does not depend noticeably on the initial deployment. 
Limiting node connectivity via trip connection time. We defined the vehicle- 
shareability network in such a way that nodes that represent individual trips are 
connected only if it is feasible for a vehicle to serve those trips one after the other 
without introducing any delay in their pick-up and drop-off times. Checking 
whether two trips satisfy such criteria requires knowledge of travel times in the 
city, which is estimated using the method described previously. Since this network 
definition puts no constraints on the connectivity apart from the feasibility of 
consecutively serving trips, the number of network links grows quickly with the 
increase in the number of trips. This is because trips separated by a large enough 
time gap between their drop-off and pick-up times can always be served by the 
same vehicle although they may be spatially far from each other. This leads to a 
very high connectivity in the vehicle-shareability network because most pairs of 
trips separated by enough time can satisfy this connectivity condition. To limit 
the number of edges in the network, and to make sure that the vehicles do not 
operate without any passenger onboard for too long leading to underutilization 
and an increase in the void ratio (the fraction of time vehicles operate without a 
passenger), we introduce an upper bound on the connection time between the 
trips. The connection time is defined as the time a vehicle operates without a 
passenger between the consecutive trips. 

The first issue to address is how to set the bound 6 on the trip connection time, 
which is a parameter that can be used to trade off fleet size against vehicle and traffic 
efficiency. On the one hand, when 6 is decreased to 0 we approach a situation in 
which each trip is served by a dedicated vehicle: a solution with maximum vehicle 
utilization that is also optimal for traffic (if we assume that vehicles somehow 
appear at the origin and disappear at the destination of the trip they serve), but 
incurring prohibitive costs for the mobility operator. On the other hand, when 6 
grows excessively the fleet size is reduced, but this is at the expense of a decrease 
in the operational and traffic efficiency because some vehicles may be on the road 
for long times without any passenger on board between serving the trips. Thus, 
how to set 6 is an important design choice, which should be left in the hands of 
mobility operators, traffic authorities and policy makers. 

Extended Data Fig. 1 shows how we come up with a reasonable setting for 6. 
The plot reports both the minimum fleet size as well as the average fraction of 
time a vehicle spends connecting consecutive trips (the void ratio) in seconds, for 
increasing values of 6. As expected, the former quantity decreases with 6, while the 
latter increases. For values of 6 larger than 15 min, however, the vehicle fleet size 
decreases only marginally, whereas the void ratio still increases. For this reason, for 
the results reported in the main text we have set 6= 15 min. For reference, the right 
panel of Extended Data Fig. 1 reports the yearly analysis of minimum fleet size— 
similar to what is reported in Extended Data Fig. 2—for 6= 10 min and 20 min. 
Vehicle utilization. A better understanding of the efficiency of the network-based 
vehicle-trip assignment requires a closer look into the patterns of the utilization of 
the individual vehicles in the minimum fleet. The overall time each vehicle spends 
during its operation in a day consists of travelling with a passenger on board, 
without any passenger and on the way to pick up the next one, or waiting at the 
pick-up location of a new passenger. Ultimately, the goal in an efficient vehicle- 
trip assignment is to maximize overall utilization while minimizing the operation 
costs. This is achieved for each vehicle when the fraction of time a vehicle operates 
without a passenger on board is minimized. 

Extended Data Fig. 2 reports the yearly analysis of minimum fleet requirements, 
along with the corresponding daily number of trips. Whereas the number of daily 
trips clearly displays an increasing weekly pattern, the number of required vehicles 
remains fairly constant, with a dip on Sundays. The robustness of the fleet size 


despite large variation in the number of daily trips shows that the minimum fleet 
size can tolerate handling extra trips without needing extra vehicles. The addition 
of such trips certainly leads to higher vehicle utilization, as we show here. Extended 
Data Fig. 4 reports a breakdown of the deployed vehicles into the different phases 
of deployment—passenger onboard, en route to next passenger, waiting for next 
passenger—for a better understanding of the utilization patterns. 

Extended Data Fig. 5 reports vehicle-level performance using various temporal 

metrics. The vehicle start and end of operation time during the day in Extended 
Data Fig. 5a shows that on most days, minimum fleet assignment leads to high 
operation times for the majority of the vehicles. The reported plots in Extended 
Data Fig. 5b and c on some days clearly show the existence of a small fraction of 
under-used vehicles operating on average for less than two hours, serving what 
we call ‘special-purpose trips. These trips occur mostly on the weekend and are 
spatiotemporally isolated, meaning that their existence requires new vehicles 
because the existing vehicle-trip assignment cannot be rearranged to accommo- 
date these trips successfully. 
A bin-packing model to describe fleet-size scaling. As shown in Extended Data 
Fig. 6, for a large number of days with daily trips ranging from 350,000 to 550,000, 
there is only a small variation in the minimum fleet size. This pattern seems a bit 
counter-intuitive at first glance, because basic logic implies that an increase in the 
number of trips should somehow lead to increase in fleet size. Outside this range 
this expected increasing pattern holds and for a smaller number of trips we have 
a more-or-less linear scaling (see the result of supersampling in Extended Data 
Fig. 6c). 

To explain the saturation pattern observed in Fig. 2, we use a simple bin-packing 
model to show that the reason for fleet-size robustness within a certain range is 
related to an existing spatiotemporal capacity to accommodate more trips in the 
minimum fleet. Consider a set of N vehicles with a fixed spatiotemporal capacity 
to accommodate k trips during a day. The exact value of k depends on the average 
duration ofa trip during a given time of the day, and it is limited by an strict upper 
bound equal to 24h on the maximum vehicle operation time. We start with a 
configuration where we have a certain number of trips Nx (where x < k) randomly 
distributed in the bins with a Poisson distribution. We start to add one trip at a 
time and randomly sample a small subset of n vehicles as candidate set (n is a 
hyperparameter of the model that we assume to be either 1 or 2). Two scenarios 
are possible: (1) a subset of the selected vehicles still have the capacity to accom- 
modate more trips, in which case we randomly select one of them and assign the 
trip to that vehicle; (2) none of the vehicles have spatiotemporal capacity to accom- 
modate the new trip, in which case we add a new vehicle to the system to accom- 
modate the new trip. We repeat this process and model the relationship between 
the number of vehicles and number of trips in this manner. 

An interesting plateau-then-increase pattern emerges from this model, which 
implies that for some intermediate ranges, the fleet size first increases and then 
shows some robustness with respect to a further increase in the number of trips, 
consistent with the observed pattern as observed from our minimum fleet opti- 
mization approach in Fig. 2. This simple model suggests that the reason for the 
minimum fleet-size robustness is that the probability of finding a vehicle which can 
successfully accommodate that new trip is still relatively high as many cars operate 
with a large unused spatiotemporal capacity when the number of trips is relatively 
low. The range of minimum fleet-size tolerance is determined by the maximum 
number of trips that a certain number of vehicles can serve in theory. This maxi- 
mum number depends on the spatiotemporal distribution of trips, especially the 
distribution of the trip durations. For instance, if the average trip duration in a 
day is 10-15 min, a vehicle can serve up to around 3-4 trips per hour assuming a 
5-min connection time between the trips on average. In this way the upper bound 
would be around 100 trips for vehicles that are active for most of the day. With 
this assumption the maximum number of trips a minimum fleet of around 6,000 
vehicles can tolerate is around 600,000 trips. Figure 2 and the results of the model 
in Extended Data Fig. 6d support this argument. 

Although the model in this section is an oversimplification and does not con- 
sider the complex spatiotemporal constraints that determine whether a vehicle can 
serve a trip, it does, however, capture the saturation pattern represented in Fig. 2. 
Extended Data Fig. 7 supports the idea that the robustness of the fleet size is due 
to the existing capacity in vehicles by showing how the metrics associated with 
vehicle utilization show a consistent increase in vehicle utilization for days with 
higher numbers of trips. Days with higher numbers of trips score higher average 
utilization per vehicle as can be seen in both the increase in the average time a 
vehicle spends on the road with a passenger on board for each day (see Extended 
Data Fig. 7a) and also in the increase in the average time vehicles spend waiting to 
pick up a passenger at the pick-up point (see Extended Data Fig. 7b). 
Multi-operator model. As briefly discussed in the main text, consider a situation 
in which there is more than one mobility operator, each having access only to a 
subset of trip demand data and assuming that the operators assign the vehicles in 
their fleet to the trip demands they have access to without sharing information 
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with the other mobility operators. The question is to what extent the fleet size is 
affected by the lack of information sharing between a certain number of mobility 
operators. This is equivalent to going from a global optimum to a local optimum, 
in which each vehicle receives limited information about adjacent trips and tries to 
maximize its utilization independent of other vehicles. This latter situation is the 
extreme limit at which the number of operators is very large and only a local opti- 
mum can be achieved. In the following, using a simplified model we try to address 
the cases for two and three mobility operators equally sharing the mobility market. 

For this purpose we randomly sample the trip demand data at each given point 
in time and divide the trip set into multiple subsets. For each trip subset we can 
build a vehicle-shareability network and do the minimum fleet optimization as 
described in this Letter. Each optimization leads to a minimum fleet size for each 
mobility operator. By comparing the sum of the fleet sizes for the multiple mobility 
operator case with the global minimum fleet size we can find out how far away we 
are from the global optimum. 

Extended Data Fig. 8 shows the temporal pattern of the sum of fleet sizes for a 
sample of 100 days from New York City taxi trip data. To obtain a good estimate 
for the sum of fleet sizes, we have divided the trip set in each day into two and three 
equally sized subsets by random subsampling. We repeated the random subsam- 
pling several times and each time we perform the vehicle-shareability network 
optimization to find the fleet size for each subset. The average fleet size obtained 
from several random subsamplings each day is then presented in Extended Data 
Fig. 8a and b. As shown in Extended Data Fig. 8b, the transition from a monop- 
olistic to a oligopolistic market incurs a small drop in efficiency quantifiable at 
about 4%-6% for two-operator markets, and about 6%-10% for three-operator 
markets. A further increase in the number of operators leads to higher inefficiency 
in terms of fleet size as one is moving away from the global optimum achievable in 
the monopolistic market to an increasingly partial one. If the number of disjoint 
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operators increases further the total size of the fleet would keep increasing owing 
to the lack of communication between mobility operators, even in the case when 
each of them try to optimize their fleet size based on the information about trip 
demand they receive. The fact that considering two or three operators sharing 
equal shares of the mobility market results only in a small drop in efficiency in 
terms of the fleet size shows that the minimum fleet-size optimization using the 
network-based approach for two or three independent operators is not far from 
the global optimum. 

Fleet-size inflation due to rare events. The analysis of historical data has shown 
that our model provides a robust improvement (the reduction in fleet size) on 
previous models. However, an inflation of the optimal fleet size could occur as a 
result of rare, unusual demand patterns. For instance, if a sudden burst in the num- 
ber of trips occurs around a given location but with diverging destinations, these 
trips will not be connected to each other in the vehicle-shareability network. Thus, 
for such cases to be served, the trips require separate vehicles from the existing 
pool on the road or even extra vehicles. These special events inflate the number 
of vehicles required because the nodes added to the vehicle-shareability network 
can have sparse or no connectivity to other nodes in the network. Although rare, 
such cases of trip-demand bursts can occur after events such as sports matches or 
concerts. However, based on our historical analysis it is evident that these outlying 
patterns only rarely lead to any inflation in the number of vehicles required to 
serve all the demand. 

Data and code availability. All data processed during the course of this study 
are included in this Letter and its Supplementary Information. The code for gen- 
erating the shareability networks and optimal dispatching is subject to licensing 
and could be made available upon request to the authors. New York City taxi data 
used in the study can be downloaded at http://www.nyc.gov/html/tlc/html/about/ 
trip_record_data.shtml. 
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Extended Data Fig. 3 | Intra-annual comparison between batch and 
on-the-fly models. Plots showing the percentage of trips served within 
the next 6 min from the time the trip requests are received. In the batch 
model, advance knowledge of the trip information is restricted to only 
the next minute (batch time). The trips in each batch are assigned to the 
available vehicles using the online version of the network approach from 
the minimum fleet of size 7,440. This approach scores a consistently 
higher percentage (90%) than does the on-the-fly model. In the on-the- 
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fly model, trips are assigned to the closest available vehicle. To achieve 

the same level of service using the on-the-fly model, the fleet size must 
increase by more than 30% (see Fig. 4b). The shaded region represents the 
60 variations when the vehicle warm-up phase is reinitialized 50 times, 
where o = max(%)—min(%) is the difference between the percentage of 
served trips achieved for the runs that score maximum and minimum 
values for each day. 
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Extended Data Fig. 4 | Detailed temporal patterns of the minimum 
fleet. At each time during a day, each active vehicle in the minimum fleet 
set operates in one of three possible modes: (1) empty of passengers and 
on the way to pick up a passenger, (2) empty of passengers and waiting at 
a passenger’s pick-up location to pick up, (3) serving, with a passenger on 
board. The number of vehicles operating in each of these modes computed 
for each minute during the day follows regular daily and weekly patterns, 


9 10 11 12 43 14 15 16 17 18 19 20 21 22 33 24 25 26 27 28 29 30 31 
time (day of the month) 


as shown by the three coloured curves for all months in the year. The 

total fleet size active on the road (black-to-grey curves) demonstrates 
robustness, because most of the vehicles in the minimum fleet are active at 
all times during the day (see also Extended Data Fig. 5a and b). Different 
panels correspond to different months, and the colour intensity is used to 
differentiate different days of the week. 
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Extended Data Fig. 5 | See next page for caption. 


Extended Data Fig. 5 | Vehicle-level performance in the minimum fleet 
assignment. a, Stacked horizontal bar plots showing the start and end 

of the operation time (left and right ends of the stacked bars) for each 
vehicle in the minimum fleet assignment for various days of the week. 
The day in each panel represents results for days in the second week of 
each month. Vehicle active times are represented by a very thin coloured 
bar. The vehicles are sorted by the start of operation time, and stacking 
them horizontally creates each plot for each day. In all days (except for 
the outlier day of the second Saturday in March 2011), the patterns show 
high efficiency, with the majority of vehicles starting early in the day and 
operating until the end of the day. b, Stacked horizontal bar plots, this 
time representing the total operation time of the vehicle (the length of the 
bar). The bars are sorted based on the vehicle’s total operation time, the 
lowest bar corresponding to the vehicle with the longest operation time. 

A distinct pattern emerges on most of the weekends and on some days 
during the week. A substantial percentage of vehicles on most weekends 
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operate for a short time to serve a small subset of trips, which we refer to as 
special-demand trips. We believe that the existence of these trips requires 
additional vehicles because of the way their pick-up and drop-off times 
and locations are distributed spatiotemporally. c, The q—q plot (where 

the quantile is the fraction of points below the given value) showing the 
percentage of trips served (vertical axis), using the vehicle-shareability 
minimum fleet optimization, with the percentage of vehicles represented 
on the horizontal axis. Vehicles are sorted on the basis of their total 
operation time, that is, the vehicles with longer operation times appear 

to the left of those with shorter operation times on the horizontal axis of 
these plots. Each panel corresponds to a day of the week and the curves 

in each panel represent all such days in the entire year (for example, all 
Mondays). On most weekends and consistent with the patterns observed 
in b, a large percentage of vehicles (between 5% and 10%) serve only a 
very small percentage of trips (<1%). This can be observed from the cusps 
appearing near the top of some of the curves. 
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Extended Data Fig. 6 | Modelling minimum fleet size scaling with the 
number of trips. a, Scatter plot showing the average operation time of 
vehicles with optimal dispatching for different days versus the average 
number of trips per vehicle for each day. The former quantity scales 
linearly with the average number of trips per vehicle. This holds despite 
the fact that the fleet size manifests a saturation pattern as the number of 
trips grow. b, The coefficient of proportionality between the two quantities 
in a is different and separates out the weekends. The coefficient is slightly 
lower for Saturdays (blue) and much lower on Sundays (green) compared 
to that of weekdays. c, Plot showing the interplay between the minimum 
fleet size and the number of trips for each simulated day to manifest 

how the fleet size changes as the number of trips greatly increases. The 
supersampling is done by combining the demand for similar days in two 
and three successive weeks. The number of vehicles shows linear growth 
with a ripple-like pattern of saturation and increase. d, Plot showing the 
interplay between the fleet size and the number of trips, as simulated 
using a simple bin-packing model. The oversimplified model described 
in Methods can still capture the ripple-like saturation/increase pattern. 
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Extended Data Fig. 7 | Average vehicle utilization in the minimum fleet 
assignment versus the number of daily trips. a, Scatter plot showing the 
average total time with a passenger on board per vehicle in a day versus 
the total number of daily trips for that day. Each point in the scatter plot 
represents a day. The average total time with a passenger on board, which 
is a measure of vehicle utilization in the minimum fleet assignment, shows 
an overall increasing pattern with the increase in the number of daily 

trips that is consistent with the fact that the minimum fleet size shows 
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robustness. b, Scatter plot showing the average total waiting time to pick 
up passengers per vehicle versus the number of daily trips for each day. 
The average total waiting time decreases as the number of daily trips 
increases, which again can be interpreted as an increase in the utilization 
of vehicles. The observed patterns justify the unused capacity assumption 
used to develop the bin-packing model (see Methods and Extended Data 
Fig. 6). 
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Extended Data Fig. 8 | Efficiency comparison between single and in a. b, Fleet-size percentage increase plot showing how the transition 
multiple mobility operators. The optimal fleet size in the single-operator from a monopolistic to a oligopolistic market incurs a drop in efficiency 
and the multi-operator mobility service in each day for the first 100 days of 4%-6% for a two-operator market, and of about 6%-10% for a three- 
(1 January corresponds to day index 1) in the year 2011. In the case of operator market. The further increase in the number of operators leads 
multiple operators, trips are randomly assigned to one of the operators to higher inefficiency in terms of fleet size as it moves away from the 
in equal proportions, and network-based optimization is performed by global optimum achievable in the monopolistic market to an increasingly 
each operator independently. The number of vehicles needed by each fragmented market. 


operator are then summed and the number for each operator is shown 
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Extended Data Table 1 | Real-time computation run times in milliseconds 


x At tg(ms) trval 
(minutes) average max average max 

2 12 27 2 4 
3 14 29 3 8 

1.2 4 15 30 4 16 
5 17 37 6 29 
6 20 49 8 46 
2 26 42 5 7 
3 28 46 7 14 

2.0 4 33 51 12 28 
5 A 68 20 55 
6 50 93 32 100 


Considering the day with the highest number of trips in the year, which is day 43 for the year 2011 with around 505,000 trips, we compute the breakdown of the run times for building a bipartite 
trip-vehicle graph, tg, and finding the optimum trip-to-vehicle assignment, ttva, by receiving the trip requests in the next minute, on the basis of the proposed online network-based batching model. The 
total run time tg+ tiva per batch remains under 100 ms for x= 1.2 and under 200 ms for x= 2.0. This shows the practicality of the proposed method from the computational point of view. We have also 
varied the maximum delay, At between 2 min and 6 min. The average is computed for all the minutes in the day, while the maximum times correspond to the batch computation with the maximum 
run time. The results are based on ten separate runs for the entire day, each time reinitializing the fleet deployment warm-up phase, as described in Methods. The experiments were performed ona 
Linux workstation equipped with an Intel Core i7-3930K central processing unit (CPU) running at 3.20 GHz and 32GB of random access memory (RAM). For maximum fairness, the running times are 
based on the actual times spent running the program, not on the CPU clocks assigned to the process. 
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Self-reporting and self-regulating liquid crystals 


Young-KiKim!, Xiaoguang Wang!”, Pranati Mondkar!, Emre Bukusoglu!? & Nicholas L. Abbott!* 


Liquid crystals (LCs) are anisotropic fluids that combine the 
long-range order of crystals with the mobility of liquids}. 
This combination of properties has been widely used to create 
reconfigurable materials that optically report information about 
their environment, such as changes in electric fields (smart-phone 
displays)*, temperature (thermometers)* or mechanical shear®, and 
the arrival of chemical and biological stimuli (sensors)®’. An unmet 
need exists, however, for responsive materials that not only report 
their environment but also transform it through self-regulated 
chemical interactions. Here we show that a range of stimuli can 
trigger pulsatile (transient) or continuous release of microcargo 
(aqueous microdroplets or solid microparticles and their chemical 
contents) that is trapped initially within LCs. The resulting LC 
materials self-report and self-regulate their chemical response 
to targeted physical, chemical and biological events in ways that 
can be preprogrammed through an interplay of elastic, electrical 
double-layer, buoyant and shear forces in diverse geometries 
(such as wells, films and emulsion droplets). These LC materials 
can carry out complex functions that go beyond the capabilities of 
conventional materials used for controlled microcargo release, such 
as optically reporting a stimulus (for example, mechanical shear 
stresses generated by motile bacteria) and then responding in a self- 
regulated manner via a feedback loop (for example, to release the 
minimum amount of biocidal agent required to cause bacterial cell 
death). 

We use nematic phases as representative LCs. Nematic LC phases 
comprise molecules that assume a preferred orientation, called the 
director n'. This leads to anisotropic optical properties (birefringence) 
and elasticity, the latter allowing energy to be stored in strained states 
of LCs at rest (for example, via bending, splaying or twisting)”. When 
a dispersed microphase of an immiscible liquid, solid or gas (guest) 
is introduced into an LC host, the ordering of LCs around the guest 
microphase is determined by a competition between the elastic energy 
(KR) arising from the strain of the LC and an orientation-dependent 
interfacial energy at the LC-guest interface (WR’), where K is the Frank 
elastic constant of the LC, W is the surface-anchoring energy density 
and R is the radius of the guest?®?, If WR? > KR (that is, R > K/W), the 
guest will elastically strain the LC and thus can generate a strong repul- 
sive force (elastic repulsion, F.) between the guest and the boundaries 
of the nematic LC (Fig. 1a)!°"". The strain and associated topological 
defects will also mediate interactions between guest microphases, which 
can prevent contact or coalescence of liquid microphases!”"4. For typ- 
ical nematic LCs formed from small organic molecules (thermotropic 
LCs), K 10-’ Nand Wx 10~°J m ? (ref. °); thus, K/W = 1 um and 
the associated elastic energy is about 2,400kgT (Methods), where kg 
is the Boltzmann constant and T is the temperature. In the absence of 
external forces, therefore, we predict that dispersed microphases with 
R> K/Wwould be sequestered by elastic forces within a bulk nematic 
phase and prevented from escaping into the surrounding environment. 

To demonstrate the elastic trapping of guest microphases in LCs, we 
dispersed aqueous microdroplets (0.5m < R < 3m) containing a 
model chemical solute (a red dye for visualization) and the surfactant 
sodium dodecyl sulphate (SDS) in nematic 4/-pentyl-4-biphenylcar- 
bonitrile (5CB; Fig. 1a), and then filled a mini-well (3.5 mm in depth) 


with the dispersion (Fig. 1b). The SDS adsorbed at the interface between 
the aqueous microdroplets and the LC and aligned n perpendicular 
to the local droplet interface; this is called homeotropic anchoring’. 
Accordingly, each aqueous microdroplet was surrounded by a region 
of strained LCs that included a point topological defect (called a hyper- 
bolic hedgehog'*""*; Fig. 1a). At the interface between the LC and the 
overlying bulk aqueous phase, the LC adopted a parallel orientation 
(planar anchoring), leading to a bright optical appearance of the system 
(left inset in Fig. le). As shown in Fig. 1c and Extended Data Fig. 1, the 
aqueous environment contacting the LC remained free of red tracer 
after four days at room temperature. This result is consistent with the 
effects of the elastic repulsion!” of the guest microdroplets away from 
the interface between the LC and the overlying aqueous phase by a force 
F., = —A?BnK[R/(h; + R)]*, where A is a numerical factor (A =0 for 
R<K/Wor for an isotropic phase'®"'), B is an anchoring-dependent 
constant (B= 3/4 and B= 1/2 for the parallel and perpendicular orien- 
tation, respectively, of n at the interface between the LCs and the bulk 
aqueous phase) and hy is the distance between the microdroplet surface 
and the LC interface to the bulk aqueous phase (Fig. 1a). The trapping 
of microdroplets within LCs was observed to occur regardless of the 
relative density of the microdroplets (4) and the LCs (pyc) because 
F., is much larger than buoyant forces (F,) at room temperature; for 
R=3,m and 5CB, |Fe1/F,| > 500 at hi/R <1 (Methods). 

Thermal and chemical stimuli can trigger optical responses in LCs 
via changes in LC ordering”***!°, For example, 5CB undergoes a 
first-order transition from a nematic (N) to an isotropic (I) phase at 
temperature TZC* = 35 °C. Alternatively, isothermal phase transitions 
can be induced by absorption of solutes!®. We found that heating of 
5CB containing microdroplets to Ty, > 35°C from below (through con- 
tact with a warm body) led not only to an N-to-I phase transition 
accompanied by an optical response of the LCs (bright to dark; see 
insets and dashed line in Fig. le), but also to the ejection of microdrop- 
lets and red tracer into the overlying aqueous environment (Fig. 1d and 
solid line in Fig. le). The release occurred irrespective of the relative 
magnitudes of faq and pyc, including in conditions under which the 
microdroplets sediment downwards and away from the interface to the 
overlying aqueous environment (/,4> Pic). Surprisingly, the release was 
transient, coinciding with the time period of the phase transition 
(Fig. le). After the phase transition, we did not measure any additional 
release for a period of 24h (Extended Data Fig. 1). A second pulse of 
solute was released when the system was cooled back to T.-=25°C to 
reform the birefringent N phase (Fig. 1f). We subsequently repeated 
the heating and cooling cycles and observed that, with each optical 
response, a well defined pulse of microdroplets was ejected into the 
overlying aqueous phase (Fig. 1g). After 20 cycles, the mass of solute 
dispensed into the aqueous environment was linearly proportional to 
the initial concentration of aqueous microdroplets, Cag in the LCs 
(Fig. 1h, i), revealing a high level of control. Thermal release was also 
conveniently initiated by ohmic heating of a thin, electrically resistive 
film supporting the LCs (Extended Data Fig. 1). 

We determined that the pulsatile ejection of microdroplets accom- 
panied the upward motion of the N-I interface towards the overlying 
aqueous environment (Extended Data Fig. 1). Microscopic observa- 
tions revealed that repulsive elastic interactions between the aqueous 
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Fig. 1 | Pulsatile ejection of microdroplets from LCs and accompanying 
LC optical responses, as triggered by N-I phase transitions. a, Molecular 
structure of 5CB and optical micrograph (bright field) of an aqueous 
microdroplet (containing 9 mM of SDS and red tracer) in nematic 5CB 
with a reconstructed LC director profile. F.; is the repulsive elastic force 
acting between the aqueous microdroplet and the interface between the 
LC and the overlying aqueous phase. Scale bar, 101m. b, Illustration of a 
dispersion of aqueous microdroplets in an LC that is hosted in a mini-well 
and submerged under a bulk aqueous phase that is initially free of tracer. 

c, d, f, h, Sequential photographs of the LC-filled mini-well containing the 
initial N phase (c) and after N-to-I (d), N-to-I-to-N (f) and 10 phase 
transitions (h); see Supplementary Video 1. The phase transitions were 
triggered by heating and cooling between 25°C and 50°C. Scale bar, 3mm. 
e, Mass of tracer released (solid line) from LC as a function of time before 
(black circles) and after N-to-I phase transition (red circles); the 
accompanying optical response of the LC is shown by the dashed line. The 
insets show micrographs of mini-wells (of diameter 3 mm) between 
crossed polars (top view). g, Mass of tracer released as a function of the 


microdroplets and the moving N-I interface pushed the microdroplets 
ahead of the interface in a manner that depended on the size of the 
microdroplets (Fig. 1j, k). Past studies'” have observed N-I interfaces 
to transport adsorbed colloids, but they have not reported the elastic 
levitation of colloids ahead of an N-I interface, as is required for trig- 
gering the release of the microdroplet shown in Fig. 1. The elastic 
force—evaluated as F..= + A?BnK(R/z)*, where B = 3/4 and zis the 
vertical position of the centre of the microdroplet relative to an N-I 
interface (z =0)—elastically levitates the microdroplets to a height 
defined by the net force F}""'(z) = 0 above the N-I interface (see Fig. 11 
and Methods)". In the absence of F.», the microdroplets will not be 
transported by the rising N-I interface. Microdroplets pushed by F.2 
ahead of the moving N-I interface also experience a downward Stokes 
drag force Fs; = —6T7LCVagR, where 7c is the dynamic viscosity of the 
LC and Vaq is the velocity of the microdroplet!* (which is equal to the 
velocity vy of the moving N-I interface for microdroplets transported 
by the N-I interface). If Fs exceeds the maximum value of F(z) 
(Fig. 11), microdroplets break through the moving N-I interface. For 
(z-R)/R< 1, Fe is independent of R, whereas Fs scales linearly with R; 
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number of phase transitions. Red and blue circles indicate mass released 
after N-to-I and I-to-N phase transitions, respectively. a.u., arbitrary units. 
i, Mass of tracer released after 20 phase transitions as a function of the 
microdroplet concentration (C,) initially in the LC. The data points show 
mean values and the error bars represent 1 s.d. (n > 3). j, k, Sequential 
micrographs showing the transport of small (dashed circles; R < R*), but 
not of large (solid circles; R > R*), microdroplets by a moving N-I 
interface (yellow arrows; vy; = 10jms~') upon heating. Scale bar, 201m. 
1, Calculated net force ~~ ~— acting on a microdroplet (R = 1.5m) near 
an N-I interface. The insets show microdroplets for z > R (red line), 
—R<z<R (blue line), and z < —R (black line). Red and blue arrows 
indicate the forces that favour and oppose the ejection of microdroplets, 
respectively. F.2 is the repulsive elastic force acting between the aqueous 
microdroplet dispersed in the N phase and the N-I interface, which is 
modified by the elastic force F.» and interfacial tension force F;, when the 
aqueous microdroplet penetrates that N-I interface (Methods). m, 
Calculated dependence of R* on vyy. The red dashed line indicates R* at 
Vn = 10,1m s~!, which coincides with the experimental conditions. 


thus, for each value of vyy, our model defines a critical microdroplet 
radius R* above which microdroplets are not transported by the mov- 
ing N-I interface (Fig. 1m and Extended Data Fig. 2). Our model 
predicts R*¥ =10.2\m for vy; = 10 pms! (Fig. 1m), in good agreement 
with our experiments (Fig. 1j, k; R* =10+1m). Our model also 
predicts that microdroplets with R= 1.5m pushed by an N-I interface 
with vy;=10ums~! will be able to reach within 60 nm of the interface 
of an overlying aqueous phase. At this separation, attractive interfacial 
forces, such as van der Waals and electrical double-layer forces (see 
below), mediate the fusion of the microdroplets with the overlying 
aqueous phase (Extended Data Fig. 3). 

Modified versions of the above-described elastostatic model can 
account for the ejection of microdroplets by the upward motion of 
the N-I interface during cooling (Methods and Extended Data Fig. 4). 
In addition, the model predicts that tuning of the elastic properties 
of the LC, which can be achieved using light’®, temperature'®”° and 
chemical additives®”', can trigger optical responses in LCs along with 
a continuous release of microdroplets; these predictions were verified 
by experiments (see Extended Data Figs. 5-7 and Methods). 
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Fig. 2 | Isothermal triggering of release of microdroplets dispersed in 
LCs by interfacial charge interactions. a—f, Sequential micrographs 
(crossed polars, top view, a-c) and corresponding illustrations (side view, 
d-f) of an LC film (thickness 401m, hosted in a square grid) containing 
microdroplets (concentration Csps = 9 mM and red tracer) before (a, d) 
and after addition of the cationic amphiphile DTAB (10 mM) to the 
overlying aqueous phase, at 0 (b, e) and 60 min (c, f). P and A indicate the 
orientations of the polarizer and analyser, respectively. Red and blue 
arrows indicate the forces that favour (electrical double-layer force, Fea1) 
and oppose (elastic repulsion, F.;) the ejection of microdroplets, 
respectively. Green (in e, f) and white (inset in f) circles with tails represent 
cationic (DTAB) and anionic (SDS) surfactants, respectively. Scale bar, 
200m. g, Mass of tracer released from the LC following the addition of 
DTAB at concentrations Cprazp = 2 mM (upside-down triangles), 5 mM 
(empty circles), 10 mM (triangles) or SDS at Csps = 5 mM (full circles) to 


The orientations of LCs respond to interfacial interactions, which 
can be modulated by the adsorption of specific synthetic and biological 
molecules, as well as chemical transformations (for example, enzymatic 
or photochemical) of adsorbed species’. We found that specific inter- 
facial events can introduce interactions (for example, interfacial charge 
interactions) that override the elastic trapping of microdroplets. For 
example, at room temperature, microdroplets containing anionic 
amphiphile (SDS) and solutes were elastically trapped within supported 
films of 5CB (thickness, 40 um) immersed under an aqueous phase 
(Fig. 2a, d). The LC films have bright optical appearances due to the 
planar anchoring of the LCs at the interface to the bulk aqueous phase. 
However, addition of a cationic amphiphile (dodecyltrimethylammo- 
nium bromide; DTAB) to the bulk aqueous environment triggered an 
LC anchoring transition to a homeotropic orientation. The anchoring 
transition caused an optical response from the LC (bright to dark; see 
Fig. 2a, b) and the release of microdroplets into the overlying aqueous 
phase (Fig. 2a-g). By contrast, addition of anionic amphiphiles to the 
overlying aqueous phase did not trigger release of microdroplets 
(Fig. 2g). The rate of microdroplet release was correlated closely with 
the zeta potential, ¢ of the interface between the LC and aqueous phase, 
which is controlled by addition of charged amphiphiles (Fig. 2g, h). 
These observations indicate that release of microdroplets is triggered 
by electrical double-layer forces”! (F.g)) acting between the microdrop- 
lets and the interface between the LC and the bulk aqueous phase. 
These forces can overcome the elastic forces (F.)) that trap the 
microdroplet initially. We evaluated the double-layer forces as 
F,q= —4m€y¢(R/A)(kgT /e.)” Y,Y; e /1/ where \ is the Debye screen- 
ing length, ¢ is the relative dielectric constant measured along the inter- 
face normal, €9 is the vacuum permittivity, e, is the elementary charge, 
and Y, and Yj are the effective surface potentials of the microdroplet 
and the LC interface, respectively (see Methods). The calculated net 


R (um) 

the overlying aqueous phase. The symbols show mean values and the error 
bars are 1s.d. (n > 5). h, Zeta potentials (¢) of LC-aqueous interfaces 
without amphiphiles (white bar) and with SDS (grey bar) or DTAB (green 
bars). Data show mean values and the error bars are 1 s.d. (n > 3). 

i, Calculated net force (F?“' = F.4, + E,) acting on a microdroplet 

(Csps = 9 mM) with R= 1.5m (red), 31m (black) and 51m (blue) after 
the addition of DTAB (10 mM; solid lines) or SDS (5mM, R= 1.5m; red 
dashed line) to the overlying aqueous phase, plotted as a function of hy. 
The inset shows the corresponding elastic (F.1; grey) and electrical double- 
layer (Fea; orange dashed and solid lines are for Csps=5 mM and 

Cprap = 10 mM in the overlying aqueous phase, respectively) forces for 
R=1.5pm. j, Magnitude (F;") of the kinetic barrier in i as a function of R. 
k, Measured fraction of microdroplets released with R = 1-2 1m, 2.5- 
3.5\.m and 4.5—-5.5\1m. Data are mean values and error bars are 1 s.d. 

(n> 10). 


force (F?“ = F,4, + F,,) experienced by microdroplets near the LC inter- 
face confirmed that addition of DTAB generates an attractive Feq) that 
overcomes the repulsive F,; (Fig. 2i). We also found a strong correlation 
between ¢ F.q and the microdroplet release rate (Extended Data Fig. 8). 
We examined the influence of microdroplet size (R) on F;* and the 
experimental release of microdroplets. In the limit of hi <R 
(close approach), larger microdroplets experience a larger attractive 
Bp due to F.qi (in this limit, F.; is independent of R whereas F.4) scales 
linearly with R). Importantly, however, the distance dependence of F. 
and F.q) leads to a repulsive kinetic barrier (Fr <0) that grows with 
R (Fig. 2i, j). Specifically, our model predicts that a kinetic barrier will 
prevent ejection of microdroplets with R > 2.3 1m. Our experimental 
observations confirmed the existence of the size-dependent kinetic 
barrier (Fig. 2k). The role of interfacial charge in triggering microdrop- 
let release was also confirmed by experiments involving changes in pH 
and the adsorption of polyelectrolytes (Extended Data Fig. 8). In addi- 
tion, we demonstrated that adsorption of anionic bacterial lipopoly- 
saccharides (Extended Data Fig. 9) can introduce interfacial charge 
interactions that trigger changes in the optical appearance of LCs and 
the release of microdroplets containing anti-bacterial agents (for exam- 
ple, DTAB) from the LCs. 

Past studies'~> have demonstrated that interfacial shear stresses can 
change the orientations of LCs; here we show that this leads to changes 
in both F.a(¢e) and F.;(B) due to the orientation dependence of ¢ and 
B, respectively. The orientation of the LC director changes when inter- 
facial shear stresses exceed the elastic torque required to rotate the LC 
(which is approximately equal to K(0?W/0d"), where Wis the angle of 
the LC director from the surface normal and d is distance from the 
LC interface’). For a typical nematic LC (K © 1077 N), we calculate 
that an interfacial shear stress of 1 Nm is sufficient to reorient the 
LC director within an interfacial layer with thickness of about 10~° 
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Fig. 3 | Optical response of LCs and release of microdroplets from LCs, 
as triggered by interfacial shear stresses. a-d, Optical response of LC 
films (of thickness 40 1m, hosted in a square grid supported on a treated- 
glass slide; crossed polars) with initially homeotropic (a, b; achieved 

using 2mM SDS in the aqueous phase) or planar (c, d; surfactant-free 
aqueous phase) anchoring at the interface of the LCs with the overlying 
aqueous phase, before (a, c) and after (b, d) the introduction of interfacial 
shear stress via convection in the aqueous phase. Scale bar, 200 1m. 

e, f, Calculated F.4) (e; red lines), F.; (e; blue lines) and net force 

(f; black lines) for an aqueous microdroplet (R= 1.5 1m) containing 
Cpran=2mM (¢=90 mV; Fig. 2h), with planar (solid lines) or homeotropic 
(dashed lines) orientation of the LC at its interface (with = —42.5 mV) 
with the overlying aqueous phase. g, h, Sequential photographs showing 
the ejection of microdroplets (Cprag =2 mM and red tracer) from the LCs, 
which is triggered by interfacial shear stresses, before (g) and after (h) 
stirring the overlying water with a magnetic bar for 30 min to generate an 
interfacial shear stress of about 10 N m~. Scale bar, 1 cm. 


m (as seen in our experiments). In agreement with this calculation, 
we observed aqueous phases sheared at 104s”! to generate interfacial 
shear stresses of about 10 Nm? (the viscosity of water is 10-7 kg m7! 
s_') that triggered changes in the orientations and optical responses 
of LC films with initially homeotropic (Fig. 3a, b) or planar (Fig. 3c, 
d) orientations. When the initial orientation of the LC at its interface 
with the overlying aqueous phase was planar, we observed microdrop- 
lets containing cationic surfactant (with concentration Cpraz3=2mM) 
to be elastically trapped within the LC, consistent with our calcula- 
tion that the attractive Fg, cannot overcome the repulsive F,; (solid 
lines in Fig. 3e, f); we evaluated F.q and F.) using ¢ = 13.2, B=3/4 
and ¢= —42.5 mV at the interface between the LC and the overly- 
ing aqueous phase (Fig. 2h). The presence of mechanical shear at the 
interface, however, caused the orientation of the LC to deviate from the 
initial planar orientation (Fig. 3c, d; see also Fig. 4), which changed the 
balance of orientation-dependent elastic and electrical double-layer 
forces. For the limiting case of a fluctuation to a perpendicular ori- 
entation (¢ = 19.7 and B= 1/2), we calculated the increase in Fg) and 
decrease in F,; to be sufficient to trigger ejection of microdroplets (see 
dashed lines in Fig. 3e, fand Methods). This prediction was confirmed 
by experiments showing that microdroplet release was triggered by 
the interfacial shear stresses (> 10 Nm”) generated by shearing the 
overlying aqueous phase with a magnetic bar (Fig. 3g, h). 
Conventional materials used for controlled release of small or large 
molecules do not self-report triggering events, have not been shown to 
respond to the same combinations of physical, chemical and biological 
stimuli that we describe here, and do not permit programming of the 
diversity of dynamic responses (for example, pulsatile or continuous 
release), including self-regulating responses. For example, guided by 
the results shown in Fig. 1, we designed a cholesteric (chiral nematic)” 


542 | NATURE | VOL 557 | 24 MAY 2018 


LC system that was triggered by the touch of a human finger (Fig. 4a—-d; 
Supplementary Video 2). An N-to-I phase transition, which was 
designed to occur upon exposure to physiological temperature, trig- 
gered a change in the Bragg diffraction of light (as used in electronic 
paper’ and LC thermometers’), and elastic forces Fz generated ahead 
of the N-I interface ejected a precise dose of chemical microcargo 
(a dye for visualization and a cleaning agent). We note that the twisted 
LC within the cholesteric phase modifies the magnitude of F, relative 
to an achiral nematic phase, but does not eliminate the force required 
to eject microcargo”’. The programmed release of precise doses of 
microcargo upon thermal activation differs from the behaviour of 
conventional materials that release chemical agents continuously 
upon thermal activation until the stimulus is removed or the agent 
reservoir is exhausted (for example, thermally responsive hydrogels” 
and lyotropic LC matrices’). Current methods that release precise and 
repeated doses of active agent require the use of devices incorporat- 
ing micrometre-scale chips, pumps, valves or flow channels”*. The LC 
response shown in Fig. 4a-d can also prevent the release of excess agent 
(minimizing waste or toxicity) and extend the lifetime of the material so 
that it can withstand repeated triggering events (for example, releasing 
cleaning agents from a touch screen by exposure to physiological tem- 
perature). The optical response of the LC also permits self-reporting of 
the triggering event, which can be used to signal the amount of agent 
released and the remaining useful lifetime of the material. 

We also used the results shown in Figs. 2 and 3 to design self-reporting 
and self-regulating LC materials that are triggered by mechanical shear 
stresses generated by the swimming motion of motile bacteria (for 
example, Escherichia coli). As noted above, interfacial shear stresses of 
1Nm ~’ can reorient an LC (see Fig. 3 and associated text) and generate 
LC interfacial velocities v, of 101ms~! (calculated from viscous stresses 
of about a(Ov,/0d) within the LC film, where a is the LC effective 
viscosity, a +107! kg m7! s~!). We observed that the arrival of motile 
bacterial cells moving at 30-40 um s~! triggered shear-induced changes 
in the orientation of the LC, leading to optical reporting of the pres- 
ence of bacteria (Fig. 4i) and triggering of the release of microcargo of 
antibacterial agents (Fig. 4e—-h; Supplementary Video 3) via changes 
in F.q and F, (see Fig. 3e, f). Non- (v;=0 Wm s)or weakly motile 
bacteria (v;<10jms~') generated insufficient shear stresses to trigger 
the LC (Extended Data Fig. 10). Whereas conventional controlled- 
release materials, used in areas such as healthcare, water purification 
and food safety, release active agents regardless of whether bacteria are 
present (which can lead to antibacterial resistance)?” or not, our LC 
material designs are self-regulating; they do not release antibacterial 
agents in the absence of bacteria (Fig. 4f) and release only the minimum 
amount of biocidal agent required to kill the bacterial cells (Fig. 4g). 
Specifically, self-regulation involves a feedback loop in which the bac- 
terial shear-triggered release of antibacterial agent causes cell death, 
which in turn eliminates the trigger and ends the release of antibacte- 
rial agent (Fig. 4h). In addition, the self-reporting function of the LC 
material signals the successful killing of the bacteria (Fig. 4i). To our 
knowledge, this is the first example of a material with the capability to 
provide a self-regulated release of chemoactive agents in response to 
mechanical forces generated by living cells. 

Our results demonstrate that LC materials can be designed to report 
thermal, chemical and mechanical triggers and release microcargo in 
response to these triggers. Given the broad range of triggers and under- 
lying colloidal interactions (for example, elastic, electrical double-layer 
and shear stresses) that are perturbed by these triggers, our results pro- 
vide the basis for a highly versatile approach. For example, in addition 
to the results reported above, we have used LCs in free-standing forms, 
such as LC emulsion droplets (Extended Data Fig. 11). We have also 
designed LC systems that respond to combinations of stimuli (for exam- 
ple, thermal plus chemical), thus providing more selective responses 
than is possible with a single stimulus (Extended Data Fig. 11). We note 
that specific biological processes, including specific binding events”® 
and interfacial enzymatic events”’, can lead to changes in interfacial 
charge, thus enabling the introduction of additional selectivity into 
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Fig. 4 | Release of microcargo from LCs by the heat of a human finger or 
by motile bacteria that generate interfacial shear stresses and interact 
with microcargo in a self-regulated manner. a-d, Schematic illustration 
(a) and sequential photographs (b-d) showing that the touch of a human 
finger can trigger a change in the Bragg diffraction of light and release of 

a well defined pulse of microdroplets (Csps = 2 mM and red tracer) from 

a cholesteric LC into the water (see Supplementary Video 2). Scale bar, 
1cm. e-h, Illustration (side view, e) and sequential photographs (side 


the reporting and release of microcargo. Promising future directions 
include the use of magnetic or electric fields to modulate the responses 
and actions of LC systems, the use of other LC phases, such as smectic 
and edible lyotropic chromonic LCs” and the triggered release of solid 
microcargo from LCs (Extended Data Fig. 12). 
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Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
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METHODS 


Comparison of the magnitudes of elastic and thermal energies. The elastic 
interaction energy E, between a microdroplet and the boundaries of the nematic 
LC (for example, the interfaces of N-overlying aqueous phase, N-glass, N-I) can 
be written as!) 


2 R* 
E, =A’ BnK —_, (1) 
(h+ R) 

where A is a numerical factor (A =2.04 for microdroplets of R > K/W with homeo- 
tropic anchoring in an N phase and A =0 for microdroplets of R < K/W or in an I 
phase)", B is the anchoring-dependent constant (B= 3/4 and B= 1/2 for planar 
and homeotropic anchoring, respectively, at the boundaries of nematic LCs), R is 
the radius of the guest microdroplet, h is the distance between a microdroplet 
surface and the nematic boundary and K =(K, + K;)/2 is the Frank elastic constant 
of the LC, where Kj and K; are elastic constants for splay and bend deformations, 
respectively’. As a microdroplet approaches a nematic interface, E. increases and 
exhibits a maximum at h =0. Because K = 107!” N for typical thermotropic LCs, 
the maximum elastic interaction energy E,"™ for a microdroplet with R =1,4m 
and B= 3/4 is 9.8 x 107 '8 J. kgf =4.1 x 10-7! J at T=25°C, and thus 
ES = 2, 383k,T. 
Comparison of the magnitudes of elastic forces and buoyant forces. Based on 
equation (1), the elastic repulsive force F. between a microdroplet and a nematic 
interface can be written as!°! 


4 
Re saank| | (2) 
h+R 


F, is valid for a microdroplet in an N phase and needs to be modified for a 
microdroplet at an N-I interface (see below). The maximum F, is F."* = A’BuK 
at h=0 and the buoyant force F, acting on a microdroplet in the LCi is F, = (4/3) 
TR3g(pic — Paq)- At 25°C, Kscp=7.3 pN (ref. *!), Kz7 = 14.4 pN (ref. *7), 
Pscp = 1.010g cm~? (ref. 7), and we measured pg7 = 1.057 g cm~?, 
Paq= 1.018 gcm~ with red dye and pzq= 1.012 gcm~? with green dye. Therefore, 
for a microdroplet (red dye) of R =31m and B=3/4 in 5CB, we get F."" = 8, 072F, 
and for a microdroplet (green dye) of R=4\1m and B= 3/4 in E7, we have 
FM _ 1, 1048, 

Net force F;"“(z) upon heating (Fig. 1). The net force F;“'(z) acting on a qua- 
si-static microdroplet with respect to the distance z between the centre of the 


microdroplet and the N-I interface (z= 0) upon heating can be written as 


| 
Zz 
Fea 
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FEN(z) =|-A’ BuK - + |A’ Bak 
, h,+R 
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at z > R (in the N phase; see Fig. 1] (i)), 


Fe (z) =|—A’ BnK : ak al + [A’BrK] , 
(hy 4. z+R 2) 2 
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at -R<z<R (at the N-I interface; see Fig. 1] (ii)), and 
FR (2) = oar 1 ae) (5) 


Fo 


at z < —R (in the I phase; see Fig. 11 (iii)). In these equations, h is the distance 
between the microdroplet surface and the interface between the LC and the overly- 
ing aqueous phase (Fig. 1a), « =0 for R> K/W, A =() =0 for R <K/W, and B=3/4 
(parallel orientation at nematic boundaries). 

When a microdroplet (R > K/ W; homeotropic anchoring) is in the N phase 
(z> R), the net force FEN N(Z) arises from F, and F. with!!! 4 =2.04; see Fig. 1] (i). 
As the N-I interface Ge 0) approaches the microdroplet, FRy(z) increases 
and becomes positive (upward) at z<z* (at z=z*, Fi = 0) owing to the elastic 
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repulsion from the N-I interface (F.2 in equation (3)); z* =17.0,m for a 
microdroplet with R =1.5,1m and h; = 1mm in 5CB. This reveals that the 
microdroplet will be levitated at z =z* above the N-I interface. Fyy(z) shows a 
maximum at z = R (red curve in Fig. 11). 

When the microdroplet penetrates the N-I interface (-R < z< R; Fig. 1] (ii)), 
two additional forces are generated from the changes in interfacial tensions'”** 
(Fi) and elastic strain!”*> (F.«). These forces drive the microdroplets into the I 
phase (Fig. 1] (ii)); see separate sections on Fi, and F.« for additional detail. 
Additionally, F, needs to be modified because the part of the microdroplet that 
protrudes into the I phase no longer strains the LC. Because R in equation (2) is 
the radius of a microdroplet in the N phase, for simplicity, we used (R+-z)/2 for 
heating and (R - z)/2 for cooling as the effective radius of the part of each 
microdroplet in the N phase. We also need to take into account the decrease in the 
topological strength m of the microdroplet” from 1 to 6/n, where 6 (0 < 0 < x) 
is half of the central angle of the part of the microdroplet in the N phase; see 
Extended Data Fig. 4a (ii). Because the elastic interaction is proportional!” to m’, 
the numerical factor A in equation (4) can be described as A =2.04m = (2.04/z) 

os '(—z/R). Consequently, Fy'\;(z) decreases rapidly at —R <z< R (blue curve 
in Fig. 11). 

When the microdroplet is in the I phase (z < —R; Fig. 11 (iii)), the net force 
FR4‘(2) is composed of only F,; A =0 and thus F.=0. Therefore, the microdroplets 
in 5CB (scp < aq) sink whereas the microdroplets in E7 (pg7 > Paq) rise. 

Upon heating, the elastic repulsion from the N—I interface promotes release of 
the microdroplets (Fe: in equation (3)). Therefore, the moving N-I interface can 
only transport microdroplets with R > K/W in the nematic phase (F.2 ¥ 0). The 
interface passes through the microdroplets with R < K/W because F.2=0. 

Net force F?“(z) upon cooling (Fig. 1). The net force F"“'(z) acting on a quasi- 
static microdroplet with respect to z can be written as: 


Bet 
1 (2) = FrR "g Pic-B Pq) (6) 
Fo 
at z > R (in the I phase; Extended Data Fig. 4a (i)), 
Feet (2) =|4°BrK +. [7 ;| + [-ABnK] , 
(n, +452) ‘2 
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at -R<z< R (at the N-I interface; see Extended Data Fig. 4a (ii)), and 
ey R)* 4 
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at z < —R (in the N phase; see Extended Data Fig. 4a (iii)). In these equations, F.3 
is the upward elastic repulsion from an LC-glass interface and hy) is the distance 
between a microdroplet surface and the LC-glass interface (Extended Data 
Fig. 5k). For R> K/W, A= (2.04/n) cos !(z/R) in equation (7) and A = 2.04 in 
equation (8); for R< K/W, A=0; B =3/4. In the I phase (z > R; see Extended Data 
Fig. 4a (i)), aqueous microdroplets in 5CB sink because pscp < Pag (that is, 
Fo'(z) < 0). 

In contrast to heating, the two additional forces (Fi, and F.») at -R<z< Rare 
directed upwards upon cooling (see Extended Data Fig. 4a (ii)); Fi, >0 and Fe« > 0. 
As a result, F"x,(z) becomes positive and exhibits a maximum at -R<z<R 
(Extended Data Fig. 4a). Importantly, upon cooling Fi, > 0 and F.« > 0, regardless 
of R. This indicates that the cooling N-I interface can transport the microdroplets 
with both R> K/W and R < K/W (Extended Data Fig. 4b), whereas upon heating 
the interface cannot transport the microdroplets with R < K/W (because F.=0). 

At z < —R (N phase; see Extended Data Fig. 4a (iii)), the microdroplets with 
R> K/W are sequestered in a nematic bulk whereas the microdroplets with 
R<K/W sediment away from the N-I interface. 

In Fig. 1c-i, vy; upon both cooling and heating was 37 +3 jms’, for which 
our model predicts the dispensing of microdroplets with K/W < R < 3m upon 
heating and 0.6,1m < R < 6,1m upon cooling. This prediction is consistent with 
our observation that the amount of tracer released upon cooling was greater than 
that released upon heating (Fig. 1g). 
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Interfacial tension force, Fi. When a microdroplet is near an interface dividing 
two immiscible fluids (N and I phase in our case), Fj, arises to minimize the surface 
tension energy™. Typically, microdroplets are stabilized at the interface between 
two immiscible fluids because the interface tensions are similar in magnitude to 
each other. In thermotropic LCs, however, the surface tension at the N-I interface 
(nz) is much smaller than the surface tension at the interfaces between the aque- 
ous solution and the N (Gag-n) or I (Gaq-1) phases. In the case of 5CB, for example, 
at T= 35°C, dag-n © 7X 107) m™, dag © 6 X 10-7 J m? and on © 107° J m? 
(Gag-N > Fag-1> On1)?"*. As a result, the aqueous microdroplets at the N-I inter- 
face are expelled to the I phase™. For simplicity, we assume that Fi is active only 
when the microdroplet contacts the N-I interface; Fi, =0 at |z| > R. 

Elastic force, F.x. When a microdroplet penetrates the N-I interface (-R<z< R), 
the elastic force acting on the microdroplet is modified!”*° by F.+. Whereas F. acts 
to keep microdroplets in the nematic phase, F.+ expels the microdroplets into the 
isotropic phase to minimize the elastic free energy. In the weak anchoring regime 
(R< K/W), F.« originates from the anchoring of the director at the microdrop- 
let surface and the director deformation in the bulk nematic phase, and can be 
written as 


E(R<K / W)=(WR f(z /R)] f,(z/R) (9) 


surface 


(WR)? 
K 


bulk 


where f;(z/R) is a dimensionless function of the penetration depth (z/R) of a 
microdroplet into an N phase!”*?. 
In the strong anchoring regime (R > K/W), F,« is given by 
EAR>K/W)=(K f@/R)], (10) 
where f3(z/R) is also a dimensionless function!”*°. Andrienko et al.*” found that 
the force acting on a particle passing through an N-I interface is linearly propor- 
tional to the penetration depth z/R. In our evaluation, therefore, we simplified the 
dimensionless functions to f (z/R) =a(+1—2z/R)and f (z/R) = B(F1-—2/R) 
where - and + denote N-to-I and I-to-N phase transitions, respectively. 
Electrical double-layer interaction, Fa (Figs. 2 and 3). F.q can be calculated as 


(11) 


e 


2 

Eqe= tee” (") rye 
AL & 

where € is the vacuum permittivity, ¢ is the relative permittivity of the LC, e is 

the elementary charge, ¥, is the effective surface potential of the microdroplet, Yj 

is the effective surface potential of the interface between the LCs and the overlying 

aqueous phase and ) is the Debye screening length’. 4, Y, and Y; can be written as 


i EyekgT a 8 tanh (DC ¢./kgT) 
n 2 P 
D 
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Yo (R/) + kp? (12) 
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kpT 
where Ng is the Avogadro constant (Na = 6.02 x 1073 mol"), is the ionic strength 
of the LC, D is a numerical factor and ¢, and ¢; are the zeta potentials at the 
microdroplet surface and LC interface, respectively. In our evaluation, we used D 
values between 1 and 2 and J= 1.8 x 10-> mol m3 (corresponding to A ~ 0.66,1m 
at the LC interface with planar anchoring); \ ~ 1.5\1m in 5CB without added 
electrolyte**. The range of D values reflects the fact that the surface potential 
(~|D¢)) is typically greater than the corresponding zeta potential? (|¢|). We note 
that the exact value of D does not change the conclusions of our model. The value 
of D will influence the calculated radius of the microdroplets above which the 
kinetic barrier (F"“') will prevent release of the microdroplets. Because equation 
(11) is valid at h> \ and it tends to overpredict”! F.q) at h < \, we only evaluated 
F.gyath > 2. 
In Figs. 2 and 3, therefore, the net force FP*'can be written as 


2 
ky = 
Fett — —tneyeS fat) YYe m/N 4.) (2.04) °BrK 
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Fea Fe 


As shown in Fig. 2a—c, microdroplets containing anionic SDS (Csps = 9 mM) 
elastically sequestered in LCs can be ejected by an upward Fea (> 0) induced by 
the addition of cationic DTAB (Cprag= 10 mM) into an overlying phase; for a 
microdroplet of R= 1.5,1m in 5CB, Feq) = 9.2 pN and F.j = -6.2 pN at hy = 1pm. 
As evidenced by the optical response shown in Fig. 2, SDS or DTAB causes an 
anchoring transition from a parallel to perpendicular orientation at the interface 


between the LCs and the overlying aqueous phase. Because the anchoring transi- 
tion involves the variation of B and ¢ (for the parallel orientation B= 3/4 and 
e=(e,+ &) /2, and for the perpendicular orientation B= 1/2 and e = E\)> Fea 
increases, whereas F.; decreases. For 5CB,¢, = 6.7 and g = 19.7 at T=25°C and 
we used ¢ = (€, + E\) /2 = 13.2 for the parallel orientation at the interface between 
the LC and the overlying aqueous phase because the microdroplet interface exhib- 
its homeotropic anchoring. 
Ejection of microdroplets upon the convection or the arrival of motile bacteria 
(Figs. 3 and 4f-h). In the system, the net force is also expressed by equation (13). 
Microdroplets with cationic DIAB (Cprazg = 2 mM) have an upward F,q but are 
elastically sequestered because F.g) cannot overcome F.; under planar anchoring 
at the interface between the LC and the overlying aqueous phase (solid lines in 
Fig. 3e, f);e = (€, + Ey) /2 = 13.2, B=3/4, Feqj=6.5 pN and F.j = —9.3 pN at 
h, = 1m for R= 1.5\1m. The introduction of convection or of the motile bacteria 
in the overlying phase, however, generates a dynamic fluctuation of the director 
orientation at the LC interface (Figs. 3a—d and 4i), leading to changes in both Fo, 
and F.q). If homeotropic anchoring is induced, for example, ¢ increases to 19.7 
(c= Ep F.qi=9.9 pN) from 13.2 (Feai= 6.5 pN), while B decreases to 1/2 (Fej = —6.2 
pN) from 3/4 (F.; = —9.3 pN). Consequently, an enhanced F,q) can override the 
reduced F. , enabling the ejection of microdroplets from the LC (dashed lines in 
Fig. 3e, f). 
Continuous release of microdroplets through thermal tuning of the elastic 
properties of the LC (Extended Data Fig. 5). To illustrate this point, we used a 
nematic LC called E7, with pg7 > paqand Tr = 60°C. At 25°C, aqueous microdrop- 
lets (0.5m < R < 4,1m) were elastically sequestered in E7 because F.y/F, = 1,194. 
Both thermal tuning of the elastic properties (at T < Ty); Extended Data Figs. 5 
and 7) and isothermal tuning of the elastic properties by partitioning solutes into 
the LC (Extended Data Fig. 6) led to continuous release of microdroplets seques- 
tered in the LC into the overlying aqueous phase. By contrast, when exposed to the 
same thermal stimulus, 5CB showed pulsatile release of microdroplets (Extended 
Data Fig. 5e-i). The experiment and modelling revealed that release from E7 
occurred when |F..| decreased below |F,| (Extended Data Fig. 5k), which in turn 
depended on R, hand T (thermal) or C (solute). With E7 at T =59°C, we calculated 
this constraint to be satisfied for R > 22|um (Extended Data Fig. 51, m). Consistent 
with this prediction, we observed that individual LC microdroplets with R< 10m 
were not released. Large microdroplets with R > 22 \1m, or clusters of LC 
microdroplets formed through LC-mediated elastic interactions’ with an effec- 
tive radius of R > 22 1m, were released (Extended Data Fig. 7). 

The net force F;'“' acting on a guest microdroplet in a nematic phase comprises 
F, and F, (Extended Data Fig. 5k), and is calculated as 
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Fey Fg 
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where A =2.04 and B= 3/4. Here, the upward forces are Fy (pg7 > Paq) and the 
elastic repulsion from the LC-glass interface (F.3). 

Parameters used in the calculations. For 5CB, at T =25°C we used Kscg =7.3 pN 
(ref. #!) and pscp= 1.010 gcm~? (ref. 33), and at T=35°C (=Ts<") we used Kscp=3 
pN (ref. >"), pscg = 1.000 gcm~? (ref. 3°), W=10-° J m~? (ref. 8) and 
nscp=0.015kgm~! s~! (ref. 4°), The ony value of 5CB is 1.5 x 10-5 J m? (ref. *°). 
In our calculation, however, we used on; = 1.5 x 10~° J m~? because the surfactants 
added to the microdroplets reduce the surface tension*!~*?, For E7, we used 
Ky7= 14.4 pN (25°C), 10.25 pN (40°C), 7 pN (50°C) and 2 pN (59°C)*’, and we 
measured pp7= 1.057 gem? (25°C), 1.045g cm~* (40°C), 1.037 g cm~* (50°C) 
and 1.028gcm~> (59°C). We chose a and on the basis of the experimental results 
as a =9.9 and 3 =0 for R< K/W and a =0, 6 =4.4 for R> K/W. We used 
h, =h) = 1mm for Fig. 11 and Extended Data Fig. 4, and hy +h. +2R=3.5mm 
(thickness of LC layer in mini-wells) for Extended Data Fig. 5. For red-dye 
microdroplets (Fig. 11, m and Extended Data Fig. 4), we measured 
Paq= 1.018 gcm~? (25°C) and 1.013 gcm~3 (35°C). For green-dye microdroplets 
(Extended Data Fig. 51, m), we measured pq = 1.012 gcm~? (25°C), 1.004gcm-? 
(40°C), 0.996 gcm~? (50°C) and 0.987 gcm~? (59°C). In F.gi calculations, we used 
¢= +108 (Cpran= 10mM),+90 (Cpran=2mM), 116 (Csps=9mM), 108.5 
(Csps=5mM) and —42.5mV (no surfactants). 

Materials. Nematic liquid crystals, 5CB and E7, and the chiral dopant S-811 
were purchased from HCCH (Jiangsu Hecheng Display Technology Co., Ltd). 
Water-soluble dyes that were used as tracers were purchased from MontBlanc. 
SDS, DTAB, dimethyloctadecyl[3-(trimethoxysilyl)propyl]jammonium chloride 
(DMOAP), FITC-dextran, silver acetate and lipopolysaccharides (LPS) were 
purchased form Sigma-Aldrich. Lysogeny broth was purchased from Becton, 
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Dickinson and Company. Transmission electron microscopy (TEM) grids were 
purchased from Electron Microscopy Sciences. The polymeric alignment layer 
(PI2555) was purchased from HD Microsystems. A Sylgard 184 silicone elastomer 
kit for preparing polydimethylsiloxane (PDMS) was purchased from Dow Corning. 
Biopsy punches were obtained from Integra Miltex. 

Preparation of LCs containing aqueous microdroplets. To stabilize the disper- 
sions of aqueous microdroplets in the LCs, we first added either SDS or DTAB at 
a specified concentration to the aqueous solutions of water-soluble dyes (10 wt% 
with respect to aqueous solution). The aqueous solutions of dyes were emulsi- 
fied into the nematic LCs (5CB or E7) by vortexing for 1 min at 3,000 revolu- 
tions per minute (rp.m.) and sonication for 10 min. The volume fractions (v%) 
of aqueous microdroplets (C,q) dispersed in the LCs were 20 v% for Figs. 1c—h 
and 2g (Csps=9mM), 0.5 v% for Fig. 1j-k (Csps =9 mM), 5 v% for Fig. 2a—c 
(Csps=9mM), 10v% for Fig. 3g, h and 4f-h (Cprag=2 mM) and 10 v% for 
Fig. 4b-d (Csps = 2 mM). Each surfactant was present at a concentration below 
its critical micelle concentration. 

Preparation of LC-filled mini-wells. Mini-wells were made of PDMS. Elastomer 
base and curing agent from a Sylgard elastomer kit were mixed in a ratio of 10:1. 
The mixture was then cured at 60°C for 2h. A cured PDMS disk with a diameter 
of 6mm was obtained using a 6-mm biopsy punch. Subsequently, a cylindrical hole 
with a diameter of 3mm was punched at the centre of the 6-mm-diameter disk 
using a second biopsy punch. The PDMS was treated with an oxygen plasma for 
20s and bonded to a glass substrate to create a mini-well with a depth of 3.5mm. 
After fabrication, the mini-wells were stored for at least 3 days before filling with 
1811 of LCs containing guest microdroplets. Subsequently, the mini-wells were 
submerged into glass vials filled with 2 ml of aqueous solutions. If used within 
3 days, the PDMS surface was sufficiently hydrophilic to allow water to spread 
between the LC phase and the PDMS surface. 

Preparation of samples for microscopic observations. Experimental cells for 
microscopic observations were assembled from glass plates coated with a polyimide 
film (P12555; Fig. 1j, k and Extended Data Fig. 2) or DMOAP (Fig. 4f-h and 
Extended Data Figs. 9b-e and 10), which caused planar and homeotropic align- 
ment, respectively. PI2555 substrates were rubbed to achieve unidirectional align- 
ment of n and were assembled in an anti-parallel fashion. The gaps (100-300 1m) 
between the plates were set by using double-sided tape. PI2555 cells were filled 
with the LCs containing microdroplets and then observed while either heating 
or cooling one edge of the optical cell to drive the N-I interface across the field of 
view. DMOAP cells were filled with aqueous solutions, followed by the LCs, and 
were then characterized using microscopy. 

Preparation of LC films. As described in Figs. 2a—c and 3a—d (and Extended Data 
Figs. 3, 9f-i, 12), TEM grids were placed onto DMOAP-coated glass substrates 
and filled with LC containing the aqueous microdroplets. Subsequently, the films 
(401m in thickness) were submerged into aqueous baths. The DMOAP-coated 
glass was used to orient the LC perpendicular to the glass substrate and prevent 
penetration of the aqueous phase between the LC and the glass substrate. 
Preparation of interfacial shear stresses (Fig. 3a—-d, g and h). As shown in 
Figs. 3a—d, the LC films were submerged into aqueous SDS (Csps = 2 mM; homeo- 
tropic anchoring at LC-aqueous interface, Fig. 3a) and pure water (planar anchor- 
ing at LC-aqueous interface, Fig. 3c). Subsequently, shear stresses were induced 
at the interface between the LC and the bulk aqueous phase by circulating the 
latter with a pipette. Figure 3g and h shows a mini-well filled with 5CB containing 
microdroplets being submerged into a water bath. Subsequently, shear stresses were 
induced at the interface between the LCs and the bulk aqueous phase by rotating 
a magnetic bar (700-800 r.p.m.). 

Preparation of cholesteric LC (Fig. 4a—-d). 20 wt% of chiral dopant (S-811) 
was dissolved in 5CB. The clearing temperature of the LC was measured to be 
Tyr = 27°C. 

Preparation of bacterial dispersions (Fig. 4f-h). Escherichia coli (strain MG1655) 
were grown aerobically in 1 ml of lysogeny broth [1% (w/v) tryptone, 0.5% (w/v) 
yeast extract and 1% (w/v) NaCl] at temperature T= 37°C with agitation (200 
r.p.m.) for 12h. To achieve motile bacteria, the culture was diluted into 2 ml of 
fresh lysogeny broth in a 1:100 ratio and the bacteria were grown again for 2h 
(T= 37°C, 200 r.p.m.). The density of bacteria in the resulting dispersion was 
107-108 cells ml. 

Optical responses in Figs. le and 4i. The optical response shown in Fig. le is the 
intensity measured from the micrographs of mini-wells between crossed polarizers 
(insets in Fig. le). For the optical response in Fig. 4i, the transmittance at the 
LC-aqueous interface was averaged for every frame and the average values are 
plotted for 4s for each sequence (Fig. 4f-h). 

LC system with multiple stimuli (Extended Data Fig. 11). In this system, the 
mechanisms leading to the release of microcargo are the same as those described 
in Fig. 1 (thermal stimulus) and 2 (chemical stimulus). Specifically, the net force 
acting on microdroplets in the LC is described by E.q, + Fy'y (see equations (3) and 
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(11)) during heating and F.4, + F2yj (see equation (7)) during cooling. Because the 
sign and amplitude of F.g, can be manipulated by charged surfactants added into 
overlying aqueous phases, we predicted that the combination of thermal and chem- 
ical stimuli would enable selective release of two agents that are oppositely charged. 
To test this prediction, Wells 1 and 2 were filled with 5CB containing microdrop- 
lets (Cag=20v%) doped with DTAB (Cpragp=9 mM, ¢> 0, and green tracer) and 
SDS (Csps=9 mM, ¢<0, and red tracer), respectively, and then were submerged 
into an aqueous SDS solution (3 mM, ¢< 0). Under these conditions, F.g) >0 in 
Well 1 (¢ <0 at the interface between the LCs and the overlying aqueous phase, but 
¢> 0 at their interface with the microdroplets), while F.qi <0 in Well 2 (¢ <0 at 
both interfaces). Upon thermally triggering phase transitions (0th to 4th phase 
transitions, T, =50°C and T.=25 °C), Well 1 released green tracer owing to the 
attractive F.q and the elastic forces (F.2 for heating, equation (3); elastic strain F.« 
and interfacial tension Fi, forces for cooling, equation (7)) generated by the motion 
of the N-I interface. By contrast, no release was observed from Well 2 because the 
attractive elastic forces did not override the repulsive F.q). After four phase transi- 
tions, aqueous DTAB (Cprazn =3 mM) was introduced into the bath to reverse the 
charge at the interface between the LCs and the overlying aqueous phase from ¢< 0 
to ¢>0 (and thus Feqi< 0 in Well 1 and F.q > 0 in Well 2). Consequently, the 
elastic forces were able to trigger the release of microcargo from Well 2 (Fea > 0; 
red tracer) but not Well 1 (Feai< 0, green tracer). 

Temperature control. The temperature was controlled using a STC200 hot stage 
and a controller (Instec Inc.) with and accuracy of 0.1 °C. Both heating and cooling 
were achieved by circulation of cold water. The rate of temperature change was 
typically + 15°C min“. 

Measurement for the mass of tracer released. 6 11 of aqueous solution was col- 
lected from the baths contacting the LCs after each N-I phase transition (Fig. 1 
and Extended Data Fig. li and 11f) or every 2-5 min (Fig. 2g and Extended Data 
Fig. 5j). Prior to collection of a sample, the baths were gently agitated to uniformly 
mix the tracer released from the LCs through the overlying aqueous solution. 
We estimated the mass of tracer released from the absorbance of the samples at a 
wavelength corresponding to peak tracer absorbance, which was measured using 
a NanoDrop 2000 (Thermo Scientific) spectrophotometer. 

Zeta potential measurement. 5CB (0.01 v% > Cscz > 0.001 v%) was emulsified 
in aqueous solution (water or aqueous solutions of SDS or DTAB) using a homo- 
genizer. The zeta potentials on the aqueous side of the LC-aqueous interface were 
measured using a Zetasizer Nano instrument (Malvern Instruments Ltd). 
Additional observations on the transport of microdroplets by propagating N-I 
interfaces. We made two additional observations in the experiments shown in 
Fig. 1j, k and Extended Data Fig. 2. First, we observed single microdroplets or 
microdroplet clusters with R < R* to be transported initially by the moving N-I 
interface (denoted by dotted circles in Extended Data Fig. 2c, d, g and h). However, 
as the moving interface formed bigger clusters with R > R* by collecting additional 
microdroplets, we observed some microdroplets from the cluster to be left behind 
the interface, as illustrated in Extended Data Fig. 2i-]. West et al.!” observed similar 
behaviours with solid particles and attributed them to an increase in the effective 
radius of the particles due to aggregation. Importantly, this observation provides 
insight into why only a fraction of the microdroplets was released at each phase 
transition (Fig. 1g). 

Second, we observed microdroplets to occasionally coalesce upon heating 

(denoted by white arrows in Extended Data Fig. 2c, d). Consequently, large 
microdroplets formed, which were observed to remain behind the moving N-I 
interface. This latter observation provides insight into why the amount of tracer 
released after 20 cycles corresponded to approximately 40% of the tracer loaded 
initially into the 5CB (Fig. 1i). Overall, these results indicate that the fraction of 
microdroplets released can be manipulated by tuning the clustering size and coa- 
lescence of microdroplets. 
Data availability. The authors declare that the main data supporting the findings 
of this study are available within the paper and in Supplementary Information. 
Additional data are available from the corresponding author upon reasonable 
request. 


31. Bogi, A. & Faetti, S. Elastic, dielectric and optical constants of 4/-pentyl-4- 
cyanobiphenyl. Lig. Cryst. 28, 729-739 (2001). 

32. Raynes, E. P, Tough, R. J. A. & Davies, K. A. Voltage dependence of the 
capacitance of a twisted nematic liquid crystal layer. Mol. Cryst. Liq. Cryst. 56, 
63-68 (1979). 

33. Kim, J.-W., Kim, H., Lee, M. & Magda, J. J. Interfacial tension of a nematic liquid 
crystal/water interface with homeotropic surface alignment. Langmuir 20, 
8110-8113 (2004) 

34. Israelachvili, J. N. Intermolecular and Surface Forces 3rd edn (Elsevier Science, 
Burlington, 2010). 

35. Zimmermann, N., Junnemann-Held, G., Collings, P. J. & Kitzerow, H.-S. 
Self-organized assemblies of colloidal particles obtained from an aligned 
chromonic liquid crystal dispersion. Soft Matter 11, 1547-1553 (2015). 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


36. 
37. 
38. 


39. 


Faetti, S. & Palleschi, V. Measurements of the interfacial tension between 
nematic and isotropic phase of some cyanobiphenyls. J. Chem. Phys. 81, 
6254-6258 (1984). 

Andrienko, D., Tasinkevych, M., Patricio, P. & da Gama, M. M. T. Interaction 

of colloids with a nematic-isotropic interface. Phys. Rev. E 69, 021706 
(2004). 

Shah, R. R. & Abbott, N. L. Coupling of the orientations of liquid crystals to 
electrical double layers formed by the dissociation of surface-immobilized salts. 
J. Phys. Chem. B 105, 4936-4950 (2001). 

Brown, M.A. et al. Determination of surface potential and electrical double-layer 
structure at the aqueous electrolyte-nanoparticle interface. Phys. Rev. X 6, 
011007 (2016). 


40. 


41. 


42. 


43. 


Janik, J., Krol-Otwinowska, A., Sokolowska, D. & Moscicki, J. K. Pendulum 
viscometer: a new method for measurement of Miesowicz nematic shear 
viscosity coefficients 7 and 72. Rev. Sci. Instrum. 77, 123906 (2006). 
Holmberg, K., Jonsson, B., Kronberg, B. & Lindman, B. Surfactants 

and Polymer in Aqueous Solution (John Wiley & Sons, Chichester, 

2002). 

Harth, K., Shepherd, L. M., Honaker, J. & Stannarius, R. Dynamic interface 
tension of a smectic liquid crystal in anionic surfactant solutions. Phys. Chem. 
Chem. Phys. 17, 26198-26206 (2015). 

Ong, L. H. & Yang, K.-L. Surfactant-driven assembly of poly(ethylenimine)- 
coated microparticles at the liquid crystal/water interface. J. Phys. Chem. B 120, 
825-833 (2016). 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


a 8 
3B 6 
3 4 
= 44 
no} 
o | 
o 2} 
o 4 
0/e———_-o—__9-—__-9—_-90 
0 1 2 3 4 
Time [day] 
. 
Ftp) 
Initial 


oo 
Cooling (50 °C to T,=34 °C) 


Extended Data Fig. 1 | Thermally triggered ejection of microdroplets. 
a, b, Mass of tracer released from LCs (5CB) as a function of time, before 
(a) and after (b) an N-to-I phase transition (corresponds to Fig. 1). 
Black and red points indicate the mass released before and after an N-I 
phase transition, respectively. c-e, Sequential photographs of the release 
of microdroplets (Csps =9 mM and red tracer) from the LC, which is 
triggered by phase transitions induced by resistive heating. We used 
Caq= 20 v% and 30 V for heating (to T, = 60°C, d) and 0 V for cooling 
(to room temperature, e). Heating of the sample from below was achieved 
by passing a current through an indium-tin-oxide electrode coated on 
glass. The motion of the N-I interface was upward for both heating and 
cooling. f-h, Dependence of the direction of motion of the N-I interface 
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on Ty and T.. f, Optical images showing a propagation of the N-I interface 
across a mini-well filled with 5CB (containing no microdroplets) in an 
aqueous bath upon heating from T =25°C to T, = 50°C (N-to-I phase 
transition). Upon heating the sample from below, the interface moved 
upwards (towards the interface between the LCs and the overlying 
aqueous phase) regardless of T;,. g, Upward motion of the N-I interface 
upon cooling from T =50°C to T.= 25°C (I-to-N phase transition). 

h, Downward motion of the N-I interface upon cooling from T =50°C to 
T-= 34°C (I-to-N phase transition). Scale bars, 5mm. i, Released mass of 
tracer as a function of the number of phase transitions with T, =50°C and 
T. =34 2G; Caq =20v% (Csps =) mM). 
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Extended Data Fig. 2 | Behaviours of microdroplet clusters during 
passage of the N-I interface. a—h, Sequential micrographs showing the 
behaviours of microdroplet clusters dispersed in an LC (5CB) during 
passage of an N-I interface upon an N-to-I (heating, a-d) and an I-to-N 
(cooling, e-h) phase transition. We used Cag = 3 v% (Csps = 2 mM) and 
measured vy;= 10,xms' for heating, vyy=35 ums! for cooling and 
R* = 10m for both cases. Scale bar, 100 1m (see Methods, ‘Additional 
observations on the transport of microdroplets by propagating N-I 
interfaces’ for more details). Red and blue arrows indicate the direction 
of motion of the N-I interface. Solid and dotted circles indicate 
microdroplets with R > R* (= 10m) and R< R*, respectively. White 
arrows indicate microdroplets that coalesced while being transported 
by the moving N-I interface. We note that microdroplets with R < R* 
were left behind the N-I interface in c because they were shed from 
clusters, as illustrated in i-L. i-1, Illustration of a microdroplet cluster 
being transported by a moving N-I interface. i, Single microdroplets 
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or microdroplet clusters with R < R* are transported by a moving N-I 
interface. j, As the moving interface collects more microdroplets, the 
microdroplet clusters formed at the interface increase in size. k, When 

the effective radius of a microdroplet cluster exceeds R*, the interface 

no longer transports the cluster. 1, Because some of microdroplets from 
the cluster are left behind the N-I interface, the cluster becomes smaller 
than R* and thus is transported again by the interface. m-r, Evidence that 
microdroplets with R > R* are not transported by an N-I interface moving 
at high speed (vy; = 100ums~') during N-to-I (m-o) and I-to-N (p-r) 
phase transitions. Here C,g=0.5v% and Csps = 2 mM. Scale bar, 100 jum. 
The left (m, p) and right (0, r) columns show optical micrographs before 
and after passage of the N-I interface, respectively, and the middle column 
(n, q) shows micrographs taken during passage of the N-I interface. The 
positions of microdroplets before and after the passage of the N-I interface 
were unchanged, revealing that R > R* for the rapidly moving interface. 
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Extended Data Fig. 3 | Transport of microdroplets by an N-I interface 
propagating across an LC film. a-j, Sequential micrographs (a-e, top 
view) and corresponding illustrations (f-j, side view) of microdroplets 
transported by a moving N-I interface upon heating in an LC film (5CB, 
401m in thickness). The focal plane is near the interface between the LC 
and the overlying water (red boxes in f and h). a, f, Microdroplets are 
dispersed initially in the LC bulk. b, g, When the bottom of the LC film 
is heated to Ty, =50°C (> Tyy), the N-to-I transition first occurs at the 
LC-glass interface (denoted by the asterisk in b) and the N-I interface 
propagates upwards, towards the LC-water interface. c, h, Microdroplets 
that were out of focus (red dashed circles in a and f) move into focus, 
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revealing that the moving interface transported the microdroplets towards 
the LC-water interface. d, i, As the N-I interface reaches the LC-water 
interface, the microdroplets disappear, consistent with their fusion 

with the overlying aqueous phase. e, j, After the phase transition, some 
microdroplets remain in the LC layer. k-n, Micrographs showing the 
decrease in the population of microdroplets in the LC at T =25 °C; before 
any phase transitions (k) and after 2 (1), 4 (m) and 6 (n) phase transitions. 
Scale bars, 50,1m. P and A indicate the orientations of the polarizer and 
analyser, respectively. Ty = 50°C, T,= 25°C, Cyqg=5 vol% (Csps = 9 mM) 


and vy;=8ums_!. 
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at -R<z<R (blue line; (ii)) and at z < —R (red line; (iii)). b, Critical 


Extended Data Fig. 4 | Calculated net force F"(z) acting ona 
radius R* of a microdroplet as a function of vy; upon cooling. See Methods 


microdroplet and calculated dependence of R* on vy; during an I-to-N ; 
phase transition (cooling). a, F"“(z) for a quasi-static microdroplet of for details. 
R=1.5,1m in 5CB. The insets show microdroplets at z > R (black line; (i)), 
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Extended Data Fig. 5 | Influence of LC phase behaviour and elastic 
properties on dynamics of release of dispersed microdroplets. 

a-h, Sequential photographs showing continuous release (a-d) of 
dispersed microdroplets (green tracer and Csps = 2 mM) from a nematic 
LC (E7) in response to a thermal trigger T, < Te" (pe7 > Paq), and for 
pulsatile release of dispersed microdroplets (red tracer and Csps = 9 mM) 
from 5CB (pscs < Pag Th > Tee) at 0 min (a, e), 15 min (b, f), 60 min 

(c, g) and 120 min (d, h) after heating of the baths from below to 

Th= 59°C. We used Cyq= 30v% for a-d and C,, = 20 v% for e-h. Scale bar, 
5mm. i, Corresponding time derivative of the released mass m of the 
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tracer (dm/dt; t, time) for pulsatile (red line and circles) and continuous 
(green line and circles) release. j, Released mass of green tracers from E7 
(continuous release) with respect to time at representative temperatures of 
T=40°C (blue triangles), 50°C (circles) and 59°C (red triangles). The 
data are mean values and the error bars are 1 s.d. (n=5). k, Forces acting 
on an aqueous microdroplet in E7. 1, m, Calculated net force F, Haga acting on 
a microdroplet in E7 as a function of R at h; =0 (1) and as a function of hy 
for R =25\1m (m) at T= 40°C (blue line), 50°C (black line) and 59°C 

(red line). See Methods for details. 
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Extended Data Fig. 6 | Isothermal release of microdroplets from an LC 
by a solute-triggered N-to-I phase transition. a-d, Sequential 
photographs of a solute-triggered N-to-I phase transition of 5CB at 

T =25°C (Tre? = 35 °C) at Oh (a), 1h (b), 2h (c) and 3h (d) after the 
addition of propanol to the overlying water. As the propanol diffused into 
the 5CB, an N-to-I transition occurred first at the interface between the LC 
and the aqueous solution and propagated into the LC bulk. e, Illustration 
of inverted mini-wells filled with 5CB containing microdroplets (red 
tracer), placed in baths containing pure water (left) and water containing 


propanol (Cpropanol = 16 v%; right). f-h, Sequential photographs of the 
mini-wells at 0 min (f), 5 min (g) and 30 min (h) after the mini-wells were 
submerged into the baths; C,, = 10 v% (Csps =9 mM). Although Fy 

(Pic < Paq) promotes the release of tracers, no release of red tracers was 
observed in the left bath owing to strong elastic sequestration. In the right 
bath, however, the red tracers were continuously released as the elastic 
barrier was removed by the solute-induced N-to-I phase transition. Scale 
bars, 5mm. 
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Extended Data Fig. 7 | Influence of the size and clustering of 
microdroplets on release from the LC. a-f, Optical micrographs of 
microdroplets (Csps = 2 mM and red tracer), with R= 9.5m (a-c) and 
27 um (d-f) in an LC (E7) at 25°C (a, d), 50°C (b, e) and 59°C (c, f); 

PE7 > Paq and Cyg= 1 Vv%. Scale bars, 201m. The microdroplets were 
elastically trapped in the nematic LC bulk at 25°C. As the temperature 
increased to 50°C (R> 34m for release), the microdroplets moved 
upwards and into focus, but were not dispensed into the overlying water; 
the focal plane was near the interface between the LC and the overlying 
aqueous solution. At 59°C (R > 23 1m for release), we observed the larger 
microdroplet (R = 27 1m) to escape into the overlying aqueous phase, 
whereas the smaller microdroplet (R= 9.5 |1m) remained elastically 
trapped in the nematic bulk. This observation is in good agreement with 
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our theoretical prediction (Extended Data Fig. 51, m). g-i, Micrographs of 
the clustering of microdroplets at 0 min (g), 30 min (h) and 180 min (i) 
after they were dispersed in the LC; C,g=2.V% (Csps = 9 mM). Scale bar, 
200 1m. j-o, Illustration (j) and sequential micrographs (k-o) of thermally 
triggered, continuous release of microdroplets (Csps =9 mM and red 
tracer) from mini-wells filled with E7 containing microdroplets of 
different sizes, at 0 min (k), 7 min (1), 10 min (m), 12 min (n) and 15 min 
(0) after the baths were heated to Ty = 59 °C(< Tyr); Pz7 > Pag (Fp > 0) and 
Cag = 20 v%. Scale bar, 5mm. The mini-well containing the larger 
microdroplets (left bath) exhibited a higher release rate due to the facile 
formation of microdroplet clusters with a radius higher than that for 
which F;*' > 0, consistent with the theoretical model whose results are 
shown in Extended Data Fig. 51, m. 
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Extended Data Fig. 8 | Role of electrical double-layer interactions in the 
release of microdroplets from an LC. a, Calculated repulsive elastic (Fe, 
dashed line) and attractive electrical double-layer (F.q, solid lines) forces 
acting on a microdroplet (Csps =9 mM, R= 1.5m) as a function of h; 
following the addition of DTAB with Cpraz = 10 mM (red), 5 mM (orange) 
and 2 mM (black) to the overlying aqueous phase. The inset shows the 
corresponding net forces (F;"* = F,, + Fy). b, F?“' near the interface 
between the LC and the overlying aqueous phase (h, = 4), and initial 
release rate of microdroplets (from 0 to 60 min in Fig. 2g) as a function of 
Cprap added into the overlying aqueous phase. ¢, Illustration of inverted 


Water (pH 7) 


Water (pH 13) 


T=45 °C (> T,,) 


5CB + Microdroplets (9, ,. < P,,) 


mini-wells in baths with water (pH 7, left bath) and alkaline water (pH 13, 
right bath) at T=45°C (> Typ). d, e, Sequential photographs of mini-wells 
filled with 5CB containing aqueous microdroplets (red tracer) at 0 min (b) 
and 60 min (c) after an N-to-I phase transition, with C,g= 10v% 

(Csps = 9 mM) and T, = 50°C. Scale bar, 5 mm. Because pyc < Pag (Fb < 0), 
tracers are continuously released from an isotropic phase of 5CB (F.1 =0) 
into the pure water (left bath). In the alkaline water (right bath), however, 
the release is suppressed because of the introduction of repulsive charge 
interactions between the LC interface in alkaline water (negatively 
charged) and SDS-containing aqueous microdroplets (negatively charged). 
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Extended Data Fig. 9 | Triggering of the release of dispersed 
microdroplets by interfacial charge interactions of biological 
molecules. a, Zeta potential ¢ at LC-aqueous interface without (white 
bar) and with lipopolysaccharides (LPS; green bars) from Escherichia 

coli, or with DTAB (grey bar). The data are mean values and the error 
bars are 1 s.d. (n > 3). b-e, Micrographs (side view) showing the ejection 
of microdroplets (Cpraz = 2 mM and red tracer) from LC (5CB) 30 min 
after addition of CLps=0 mg ml7! (b), 1 mg ml"! (c), 2mg ml! (d) and 
4mg ml ' (e) to the overlying aqueous phase. f-i, Sequential micrographs 
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Aqueous LPS 


LC + Microdroplets, 


Aqueous LPS Aqueous LPS 


(top view) showing the ejection of microdroplets before (f) and at 0 min 
(g), 15 min (h) and 30 min (i) after the addition of CLps=4mg ml into 
the overlying aqueous phase. Scale bars, 200m. P and A indicate the 
orientations of the polarizer and analyser, respectively. In the presence of 
LPS, microdroplets are ejected continuously from the LC, as evidenced 

by the release of red tracer (b-e) and the decrease in the population of 
aqueous microdroplets within the thin LC film (40m in thickness; f-i). 
The release rate is enhanced with an increase in Cyps (b-e), consistent with 
release controlled by interfacial charge interactions. 
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Extended Data Fig. 10 | Ejection of microdroplets from the LC by 
motile bacteria. a-f, Sequential micrographs showing that microdroplets 
containing anti-bacterial agent (2mM DTAB and 1 wt% silver acetate) 

are not ejected from the LC (5CB) in the absence of bacteria (a-c) and 

at 0 min (a, d), 15 min (b, e) and 30 min (c, f) after the arrival of weakly 
motile bacteria (d—f); Cag = 10 v%. g-i, Sequential micrographs showing 
ejection of microdroplets from the LC at 0 min (g), 30 min (h) and 120 min 
(i) after the arrival of motile bacteria and subsequent bacterial death and 


aggregation. Scale bar, 401m. g, Motile bacteria generate shear stresses at 
the LC interfaces, triggering the release of microdroplets containing anti- 
bacterial agent. h, As the anti-bacterial agents are released, the bacteria 
become less motile. At 30 min, we observe dead bacteria and cessation of 
the triggered release due to the decrease in the number of motile bacteria. 
The amount of silver acetate released is 1-2 pgyl~!. i, Two hours after the 
arrival of motile bacteria, only dead (non-motile) bacteria are observed. 
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Extended Data Fig. 11 | LC systems with complex geometries and 
responsiveness to multiple stimuli. a—c, Isothermal release of aqueous 
microdroplets (Csps = 9 mM and red tracer) from a large LC emulsion 
droplet (5CB) before (a) and at 50s (b) and 80s (c) after the addition of 
DTAB (10 mM) into surrounding aqueous phase; Caq= 5 v%. Scale bar, 
200m. The insets in a and b are optical micrographs (crossed polars) of 
the LC droplet, showing the optical response. d-f, Selective release of two 
agents, triggered by a combination of chemical and thermal stimuli (see 
Methods for details). d, In an aqueous SDS bath, the thermal stimulus 
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triggers release of DTAB-doped microdroplets (green tracer) from Well 

1, but not of SDS-doped microdroplets (red tracer) from Well 2 (1st to 

4th phase transition; left inset in f). e, Addition of DTAB into the bath 
reverses the charge at the interface between the LCs and the overlying 
aqueous phase, thus enabling the ejection of microdroplets from Well 2, 
but not from Well 1 (5th to 8th phase transition; right inset in f). Red and 
blue points in f correspond to the released mass of tracers after N-to-I and 
I-to-N phase transitions, respectively. Scale bar, 5mm. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


—tinleg ara) ‘an : 
ae, #4 i‘? 
Nee 


Extended Data Fig. 12 | Triggered release of water-soluble solid 
microparticles from an LC film. a-f, Polarizing (a, d; top view) and 
fluorescence (b, e; top view) micrographs and schematic illustrations 
(c, f; side view) of an LC film at 25°C in the initial N phase (a-c) and 
after six phase transitions (d-f) with T, =50°C and T-= 25°C. Scale 
bar, 200m. The LC film (5CB, 401m in thickness) contained solid 
microparticles of FITC-dextran (1-2 wt%). Before any phase transition, 
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a strong fluorescence signal was detected from the FITC-dextran 
microparticles that were sequestered in the LC film (b), but not from the 
water bath (g), indicating that the solid microparticles were trapped in 
the LC. After six N-I phase transitions, however, no fluorescence signal 
was detected from the LC (e), whereas the water bath showed a strong 
fluorescence signal (h). The inset in h shows the fluorescence intensity I 
from the bath as a function of the number of phase transitions. 
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Rapid emergence of subaerial landmasses and onset 
of a modern hydrologic cycle 2.5 billion years ago 


I. N. Bindeman!*, D. O. Zakharov', J. Palandri!, N. D. Greber?, N. Dauphas’, G. J. Retallack!, A. Hofmann‘, 


J. S. Lackey® & A. Bekker*® 


The history of the growth of continental crust is uncertain, and 
several different models that involve a gradual, decelerating, or 
stepwise process have been proposed!*, Even more uncertain is the 
timing and the secular trend of the emergence of most landmasses 
above the sea (subaerial landmasses), with estimates ranging from 
about one billion to three billion years ago”-’. The area of emerged 
crust influences global climate feedbacks and the supply of nutrients 
to the oceans®, and therefore connects Earth’s crustal evolution to 
surface environmental conditions? !!. Here we use the triple-oxygen- 
isotope composition of shales from all continents, spanning 3.7 
billion years, to provide constraints on the emergence of continents 
over time. Our measurements show a stepwise total decrease of 0.08 
per mille in the average triple-oxygen-isotope value of shales across 
the Archaean-Proterozoic boundary. We suggest that our data are 
best explained by a shift in the nature of water-rock interactions, 
from near-coastal in the Archaean era to predominantly continental 
in the Proterozoic, accompanied by a decrease in average surface 
temperatures. We propose that this shift may have coincided 
with the onset of a modern hydrological cycle owing to the rapid 
emergence of continental crust with near-modern average elevation 
and aerial extent roughly 2.5 billion years ago. 

Changes in Earth's surface environments between about 2.5 billion 
years ago (2.5 Gyr ago) and 2.32 Gyr ago are recorded in numerous 
isotopic and elemental systems, which point to a dramatic change in 
the oxygenation of the atmosphere and oceans at that time®’°. These 
changes were associated with a series of three to four ‘Snowball Earth 
glaciations'»!?, whose origin and driving forces are still debated. 
The major geochemical and biogeochemical rearrangements in 
Earth’s surface environments at the Archaean-Proterozoic boundary 
(2.5 Gyr ago) also left numerous signatures in the geological record. 
Among these signatures is a steep rise in the oxygen isotopic '80/'*O 
ratio (expressed as 5/80, the deviation in the ratio in per mille rela- 
tive to the standard ratio in modern seawater, VsMOW) of shales and 
zircons in the Late Archaean, followed by a progressively decelerating 
increase in these values into the Phanerozoic!*“, This first-order trend 
was modulated by the assembly and break-up of supercontinents™*. 
However, it is unclear how these changes in &!80 (as well as other 
parameters) relate to the isotopic evolution of continental crust, to 
the evolution of meteoric water, or to weathering conditions at Earth's 
surface. 

We present here triple-oxygen-isotope measurements of shales, 
which are the dominant sedimentary rocks on Earth and the prod- 
ucts of the chemical and physical weathering of landmasses that are 
exposed to the atmosphere. Shales consist mainly of clay minerals, 
secondary quartz and unmodified detrital minerals; studies of shales 
have been used previously to constrain the chemical evolution of Earth’s 
crust through time’?*!>'°, The triple-oxygen-isotope composition 
of shales is expressed here as 6’ 189 and A’’0 values; the latter para- 
meter reflects linearized deviations in per mille (%o) of !70/!°O ratios 
relative to a mass-dependent '70/'°O versus '8O/1°O fractionation 


line with a reference slope of 0.5305 (see Fig. 1 and Supplementary 
Information section ‘Methods’ for details). Both parameters are inde- 
pendent functions of temperature and oxygen-isotope variations and 
fractionation processes in surface environments!”~!’. 

The &8O0-A"0 signature of bulk shales is defined by: first, the 
proportions of detrital versus authigenic mineral components; 
second, the temperature of weathering and diagenesis, which affects 
isotopic fractionation factors; and third, the isotopic composition of the 
altering water (Fig. 1). Although used extensively in the past, '8O/'°O 
ratios alone'*°”! are insufficient for disentangling the impact of these 
various processes on the shale composition. However, as we show 
here, the combined use of §’'8O and A’!”O removes the ambiguities 
associated with using &180 alone, and allows us to reconstruct past 
surface conditions and the composition of meteoric waters involved 
in weathering. 

The shale samples used here are the same as those used previously", 
with the addition of 30 composite samples (formation-averaged) and 
10 individual recent and Archaean samples (Fig. 1, Extended Data 
Tables 1-3). These 278 samples were collected from outcrops and 
drill holes on all continents and span 3.7 Gyr. The measured 6'8O 
values agree with previously determined values for shales and other 
sediment types and detrital zircons (Fig. 2)”!*'*'®. The bulk shales 
cover a large field in the §'8O-A’"”0 space (Fig. 1). The observed 
AO values range from those typical of mantle and crust (—0.05%o 
to —0.09%o)'*? to —0.3%o. Furthermore, the triple-oxygen-isotope 
data of shales fall on different mass-dependent fractionation lines, with 
slopes between the '8O/'°O and '70/'°O ratios ranging from 0.529 to 
0.516 (Fig. 1, Supplementary Information)—values that encompass 
the entire range of slopes described previously for mass-dependent 
processes on Earth”. 

Our results confirm a gradual trend of increasing §/!8O values from 
3.7 Gyr ago towards modern times. Meanwhile, the A’!”O values 
of shales exhibit a stepwise shift to more negative and more diverse 
values during and after the Archaean-Proterozoic transition (Fig. 2a). 
Shales older than 2.5 Gyr define an average A’7O value of —0.047%o + 
0.012% (20), while shales younger than 2.2 Gyr—deposited in later 
stages and after the Great Oxidation Event (GOE) roughly 2.32 Gyr ago, 
during which O; appeared in the atmosphere—yield an average A’”O 
value of —0.118%0 + 0.024%o. This difference in the triple-oxygen- 
isotope composition of the two age groups cannot be explained solely 
by different equilibration temperatures, or by the mixing of different 
proportions of variably weathered detrital materials, as the shale record 
cuts across the 6/'8O-A’"”0 trends defined by these processes (Fig. 1), 
requiring different initial 880 meteoric waters. Moreover, there is no 
difference in the chemical index of alteration (CIA”’) or in the pro- 
portions of minerals (determined by X-ray diffraction, XRD) of the 
studied shales across the GOE (Extended Data Table 2, Fig. 2) that 
could explain the observed shift in the A’!”O values of the shales. The 
invariable titanium-isotope values® and constant characteristic elemen- 
tal ratios of the studied shales'* (Extended Data Fig. 1) suggest that 
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Fig. 1 | Triple-oxygen-isotope systematics of ancient and modern 
terrestrial materials. a, Ancient materials. b, Modern materials. 

The coloured fields in b correspond to the ancient-shale data shown 

with coloured dots in a. The concave blue curves represent isotopic 
fractionation between weathering products and meteoric water. The labels 
on the blue curves show the /'8O values of meteoric waters, ranging from 
—5%o to —40%o, and of modern ocean water (black open circle; 0%o). 

The modern meteoric water line is labelled MWL. Ina, the fractionation 
curves for bulk shale/water equilibration were constructed assuming a 
weathering product of 70% illite and 30% quartz (see Supplementary 
Information). In b, the fractionation curves are based on experimentally 
determined quartz/water triple-oxygen-isotope fractionations’’. In a, the 
grey convex mixing curve connects unweathered terrestrial materials 
(such as mantle or upper crust) with the weathering products located 


the parental, atmospherically exposed continental crust undergoing 
weathering was similar in chemical composition to the modern crust, 
and has had similar proportions of mafic and felsic rocks since at least 
3.5 Gyr ago, with a greater contribution of komatiites in the Archaean®"®. 
Higher-temperature Archaean oceans”, or a greater contribution of 
hydrothermal clays to Archaean shales, would result in less-positive 
8180 and less-negative A’!’O values. Although this could help 
to explain the lower 6/!8O values that we observe‘, it cannot explain 
the vertically extending trend of lower A’!’O values (Fig. 1, Extended 
Data Fig. 2) or the step change in A’’O values 2.5 Gyr ago (Fig. 2). In 
addition, our shales show no geological or mineralogical evidence for a 
substantial change in hydrothermal contribution or weathering inten- 
sity in the CIA parameter across the Archaean-Proterozoic boundary’. 
Taking a cue from the modern world, where meteoric water shows 
variable 6''80-A’'’O compositions'’, the simplest explanation for 
some of the oxygen-isotope variations measured in ancient shales 
is that they were in part inherited from the waters involved in rock 
alteration on the continents. 

We applied recently established isotope-fractionation factors 
for '80/'°O and '’0/'°O between quartz and water at different 
temperatures’? to transform our measured, raw §/'8O-A’"”0 data for 
shales into actual surface weathering conditions. We also calculated 
the equilibrium fractionation of oxygen isotopes between bulk shale 
and water, at different temperatures and initial 6“8O-A’”O water 
values along the meteoric water line (MWL,; Fig. 1, Extended Data 
Figs. 3-5). Oxygen-isotope fractionation between clay minerals and 
water under low temperatures is less than that for water and secondary 
quartz”°, but the two mineral/water pairs follow the same fractionation 
law (ref. 18; see also Supplementary Information and Extended Data 
Fig. 6). The bulk shale/water fractionation lines (blue curves in Fig. 1a) 
define a negative slope in &/'8O-A’’’0 space. Mixing detrital material 
derived from the continental crust with authigenic minerals follows 
a subparallel curve (grey line in Fig. 1a) within the field defined by 
the isotope-fractionation curves. The calculation shows that variations 
in temperature, in the initial oxygen-isotope composition of altering 
waters, and in mixing ratios between detrital material and weathering 
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on the blue fractionation lines; the grey secondary silicification curve 
connects modern high-8/'8O and low-AO materials (such as cherts and 
sponge spicules; individual data are shown in b). In b, the various Earth 
materials analysed here (red symbols) and in refs'*!° (black symbols) are 
normalized to a mantle A’!”O value of —0.05%o; the double-headed red 
arrow, with a slope of 0.528, connects coexisting hydrothermal quartz 
(Qz) and epidote (Ep) from the modern deep-ocean core 504B, which 
formed at about 300°C in equilibrium with roughly 0%o seawater. In 

a, the blue arrows and associated values depict the slopes of triple-oxygen 
fractionation in linearized §/!’0-8''8O space, with the values reflecting 
logarithmic linearization of the delta scales*” (see Supplementary 
Information); 0.5305 is characteristic of infinitely high temperatures and 
smaller slopes are characteristic of lower-temperature fractionations. IL, 
illites VSMOW, Vienna Standard Mean Ocean Water. 


products can explain the overall trend and negative co-variations in 
our data. 

The proportion of weathering products in a shale can be assessed 
independently via its mineralogical composition (through XRD) 
and/or its chemical composition (such as through the CIA’). 
Combining this estimate with the §/'8O and A’!’0 values of the 
shales and the pristine detrital components (igneous rocks) allows us 
to calculate, by mass balance, the 6/'8O and A’’’0 values of the 
weathering products (Extended Data Fig. 3). The CIA index has 
remained nearly constant through time’ (Extended Data Fig. 7), 
suggesting that a secular trend in weathering intensity is unlikely 
to introduce a systematic bias in this approach. The equations for 
isotopic fractionation during water-rock interactions and the equation of 
the MWL in 8/'8O-A"”0 space (Fig. 1) can be used to independently 
calculate the water-rock interaction temperature and the oxygen- 
isotopic composition (6/'8Ow and A’!”Ow) of waters involved in 
weathering and diagenesis (Extended Data Figs. 3, 4). This approach 
is likely to be oversimplified, because detrital components in shale 
precursors were probably altered by a range of meteoric and diagenetic 
waters at different temperatures in watersheds. But, given that we 
compare shales with shales, all of which have comparable CIAs, these 
complexities do not affect the first-order interpretations afforded by 
quantitative modelling. 

We find that the temperatures of interacting water and rock derived 
from the inversion of shale 6"8O-A””O values were higher during 
the Archaean than after it (Extended Data Fig. 5). This modelling 
exercise also shows that—although our measurements reveal 
trends towards heavier average 6/'°O and lighter average A’!’O 
values in shales over time—the waters involved in surface alteration 
and weathering processes became lighter in 6/'8O, heavier in A’70, 
and more variable in both 6/'8O and A”’0 after roughly 2.5 Gyr ago. 
Although the quantitative analysis makes important simplifications, it 
does capture the essential features of weathering conditions on conti- 
nents through time. 

Another explanation for the stepwise change in A’!’O values during 
the GOE could be the appearance of atmospheric oxygen and ozone. 
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Fig. 2 | Oxygen-isotopic compositions of shales through time. a, A’!”O 
record of shales (red points) and calculated oxygen-isotopic composition 
of weathering products (yellow points), showing a stepwise change in 
composition between 2.5 Gyr ago and 2.2 Gyr ago. The blue dashed lines 
represent the mean A’’’0 values, weighted by the composite sample 

size (n), for shales before 2.5 Gyr ago and after 2.2 Gyr ago. The boxes 
represent the medians, interquartile ranges and extreme values (see 
legend). A t-test reveals that the Archaean and post-Archaean A/!70 


The formation of ozone with highly positive AO values (+30%o to 
+100%o) via ultraviolet photolysis in the stratosphere leaves atmos- 
pheric oxygen with the slightly negative A’’’O value of —0.3%o in 
today’s 21 vol% oxygen atmosphere”*”», but probably much less in 
Proterozoic conditions of less than 1 vol% oxygen. However, not only 
was the atmospheric A’!”O signal small, there is also no mechanism by 
which to transfer this signature into the meteoric water cycle and hence 
into crustal silicate weathering products such as shales. 

As argued above, we favour an explanation by which the difference 
in the A’””O values of pre- and post-GOE shales occurred through 
a change in the meteoric water cycle (Fig. 3). Starting roughly at the 
Archaean-Proterozoic transition, the emerged crust would have inter- 
acted with waters that had more variable and on average more negative 
88Ow values and more positive A’”7Ow values than before the GOE 
(with A””Ow shifted by approximately +0.1%o). The observed shift 
in the triple-oxygen-isotope composition of shales would also have 
required lower post-GOE surface temperatures (Extended Data Fig. 5). 
This is broadly consistent with the findings of previous studies of 6’'°O 
values in cherts”*. 

In the modern world, the oxygen-isotope composition of precipita- 
tion depends on the cumulative history of water loss from the air parcel 
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values are statistically distinct (0.004 < P < 0.02, well below a statistical 
significance value of 0.05). We attribute the different 680 and A’’’O 
values of pre- and post-Archaean shales in this diagram to a change from a 
coast-dominated to a more continental hydrological cycle and weathering 
conditions (see text for details). The blue vertical bars at the top indicate 
major glacial episodes. b, 6/180 record of shales (red points) from our 
dataset superimposed on the dataset from ref. '* (white diamonds). 


that is travelling inland away from the coasts towards higher latitudes 
and higher altitudes (http://www.waterisotopes.org), resulting in lower 
8&8Ow values, higher A’!”Ow values, and more diverse compositions 
overall’” (as in the MWL on Fig. 1). This combined effect, which 
we call ‘continentality’ (Fig. 3), is shown through the shale record’s 
step change that coincides with the Archaean—Proterozoic boundary 
(Figs. 2a, 3). It is most likely that the observed change in the shale 
triple-oxygen-isotope record reflects the appearance of larger conti- 
nents (Fig. 3) and higher elevations from the Proterozoic onwards—a 
period that is broadly contemporaneous with the final stages in 
the assembly of the first documented supercontinent, Kenorland””®, 
or with the formation of several supercratons’”!> immediately 
before the GOE?’. Supercontinent assembly and orogenic events 
result in high mountain ranges and plateaus, as occurred when India 
collided with Asia, forming the Himalayas and Tibet. Rising moun- 
tains in even quite low to mid latitudes result in precipitation with 
very light §/!8O values (http://www.waterisotopes.org) that correlate 
with elevation, with a roughly 2%o-3%o drop in §/'8O per kilometre 
of altitude gain”®. 

Supporting our interpretations of triple-oxygen variations in shales 
is the strontium-isotope record of marine carbonates, which suggests 


Fig. 3 | Conceptual palaeohypsometry of 
Archean and Proterozoic worlds. These 
images are based on palaeomagnetic and 
tectonic reconstructions (see refs !*?° and 
references therein). a, The late Archaean 
era (shortly before 2.5 Gyr ago) during the 
assembly of the supercontinent Kenorland. 
b, The early/mid Paleoproterozoic era, after 
the occurrence of the GOE. The oceans are 
shallower in a compared with b, and the 
(excessively flooded) continents are smaller 
and lower, resulting in different hydrologic 
and weathering cycles, as described here. 
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that the area of emerged continental crust increased substantially and 
irreversibly at roughly the Archaean—Proterozoic boundary”’. From a 
geodynamic perspective, models of a cooling Earth call for a thickening 
of the lithosphere and the establishment of a higher continental free- 
board by about 2.5 Gyr ago owing to increased mantle viscosity>7°"!. 
The emergence of large landmasses (Fig. 3) would also have led to a 
larger weathering sink for carbon dioxide, which occurred at greater 
concentrations in the Archaean, resulting in a transition to moderate 
surface temperatures after the GOE. We conclude that this set of large- 
scale tectonic and near-surface changes best explains the observed shift 
in the &80-A”’0 composition of shales between 2.5 Gyr ago and 
2.2 Gyr ago (Fig. 2). 

The rapid increase in Earth’s subaerial surface and overall hypsom- 
etry 2.5 Gyr ago that we infer here (Fig. 3) would have also increased 
Earth’s albedo, the flux of nutrients to the oceans® from continents 
undergoing subaerial weathering, and the extent of continental 
margins, additionally resulting in a higher rate of burial of organic 
carbon and reducing the concentration of carbon dioxide in the air. 
Together, these changes could have contributed to the cooling of the 
planet and to the snowball glaciations of the early Palaeoproterozoic, 
followed by the GOE, highlighting how Earth's interior could have 
influenced surface redox conditions and chemistry. The most 
dramatic change in Earth’s history was marked by a transition from 
hot and largely anoxic surface conditions to an oxygenated atmos- 
phere with moderate surface temperatures. Our study suggests that this 
transition might have been modulated by long-term cooling of the 
subcontinental mantle and lithosphere, rendering it capable of 
supporting a thicker crust)?! This would have led to the emergence 
of extensive landmasses at the Archaean—Proterozoic boundary, with 
life and surface conditions adjusting to—rather than triggering—the 
change in atmospheric oxygen concentrations. 


Data availability 
Data are provided in Extended Data Tables 1-3. 
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Extended Data Fig. 1 | Comparison of isotopic and key elemental ratios 
of the shales studied here, illustrating the relative constancy of the 
composition of the exposed crust that is undergoing weathering, and 
in particular proportion of exposed mafic versus silicic rocks. See, for 
example, refs '39. The 6*°Ti data are from ref. °, which used a large dataset 


33. McLennan, S. M., Hemming, S., McDaniel, D. K. & Hanson, G. N. Geochemical 
approaches to sedimentation, provenance and tectonics. Geol. Soc. Am. Spec. 
Pap. 284, 21-40 (1993). 
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that included many of the samples studied here. The elemental data are 
from Extended Data Table 3 and ref. '*. Variation in the composition of 
the exposed crust cannot explain the oxygen-isotope trends that we have 
identified here. 
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Extended Data Fig. 2 | Distribution density plot for the samples studied here. Data from Fig. 1. 
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Extended Data Fig. 3 | Calculating the 6/180 and A’!’0 values 

of the weathering products of shales. Solving a system of three 
unknowns (for example, the temperature of alteration and the values of 
A"70 and &180 along the MWL) in three equations (Supplementary 
Information equations (1), (5) and (6)) follows these steps: the results 
of Supplementary Information equations (1), (5) and (6) are substituted 
sequentially into each other until one unknown is left. This results in a 
function with respect to temperature (T, a single parameter) to solve: 


y=0=36.323T — 23.337" + 0.002647? — 1.38 x 10’. This is a third-order 
polynomial equation that has three roots; we solve for roots in a realistic 
temperature range to obtain the temperature and 5'’Ow. The concave blue 
curve originating from a point on the MWL shows isotope fractionation 
between shale and water with the indicated temperatures of equilibration. 
The grey curve represents a mixing line between detrital and weathering 
products of the indicated proportions, computed using the CIA. 
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here the effects of varying the proportion of quartz (Q) in shales (420%), 
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Extended Data Fig. 5 | Calculated temperature, 5'°Ow and A!’Ow 
values plotted against age of shales. a, &!8Ow versus age; b, temperature 
versus age; c, A!”Ow versus age; d, 5!8Ow versus temperature. These values 
are based on the solution of three equations with three input parameters: 
8!'8O shales 8/!”7Oshale and the proportion of quartz (see Extended Data 

Fig. 3). Sensitivity analysis is provided in Extended Data Fig. 4. Most 
measurements yielded solved roots (plus signs within symbols); when 
equations could not be solved, small variations in the input parameters 
(Extended Data Table 1) allowed us to find roots. In particular, correcting 
for secondary silicification (see Extended Data Table 1) by decreasing 
A7Oshale by 0.01%o to 0.08% along the silicification line allowed us to 
find roots in all but two cases. Note that the overall calculated §!8Ow and 
temperature ranges agree with modern and recent values for surface, 
diagenetic, basinal and pore waters measured in drillholes?!. Note also 


34. Frieling, J. et al. Paleocene-Eocene warming and biotic response in the 
epicontinental West Siberian Sea. Geology 42, 767-770 (2014). 


that the absolute calculated values for the temperature of water-rock 
interaction (weathering) and §'*Ow depend on the assumed isotopic 
fractionations in Extended Data Fig. 6; however, given that these values 
solved within realistic bounds, the fractionations of ref. !° and the MWL 
defined in ref. '” are probably well constrained and calibrated in absolute 
triple-oxygen-isotope space. The lowest §'8Ow and temperature are 
computed for recent clay samples from Antarctica, and for 2.5-2.2-Gyr- 
old synglacial Palaeoproterozoic shales, confirming the participation of 
low-5!80,, synglacial waters in diagenesis, as proposed previously'*. The 
highest recent temperature and 6'8Ow values are for Palaeocene-Eocene 
(55-million-year-old) thermal maximum shales (ref. ** and Extended Data 
Table 1). e-g, Interquartile range statistics and running averages for the 
parameters computed in a-d. 
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180/16O fractionation factors. The quartz/water and illite/water 1,000Ina _illite/water fractionation) is the best-fit second-order polynomial with 
('80/18O) fractionation factors are based on refs '*3°. The bulk-shale/water _ two fit coefficients based on the equation in ref. *° that includes three fit 


1,0001na (180 /'°O) fractionation factors are based on the assumption coefficients. We used these coefficients to solve equations for bulk-shale/ 
that bulk shale comprises 70% illite and 30% quartz (that is, Q=0.3). The water triple-oxygen-isotope fractionations according to the proportion of 
blue line (for quartz/water fractionation) corresponds to Supplementary quartz (determined through X-ray-diffraction; Extended Data Table 2). 
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Extended Data Table 1 | Triple-oxygen-isotope analyses of the shale samples used here 


measured linearized and normalized Weathering product _ _Weathering waters_ 
drillcore|Composite 
so A", 50 , 
AGE, | 6%, | 50, | A", * 150, %o, i Avo | ao | @, (cs), _|Sample ae " eg] “Ow, | 8Ow |A”ow 
Ga | %,av | %, av | %, av | 9") | norm | %» | | CAE | CIA cor) Oy ol cia cor] outcrop| individual | size,n | Ref Sample Description BMT t | Quartz TCO che | x 


0 4.269 8.238 -0.101 0.022 4.426 8.387 0.024 2 9.775 0.022 5.208 d cs 18 36 Carribean Sea, Core 34 of Swedish Deep Sea Expedition 1947, secondary carbonate present 0.192 0.3 62 11.4 -6.0 0,029 
0 4.786 9.378 -0.189 0.011 4.940 9.518 -0.109 3 12.035 0.148 6.236 ° is 1 Holocene silt, Oregon 0.3 15-182 -96 0.046 
0.00 1,155 2.480 -0.160 0,031 1,320 2.661 -0.091 3 100 2.661 0,091 1,320 ° is x Suislaw River, Oregon clay +silt , river mud, Florence, Oregon 03 39 =-22.0 -11.6 0.055 
0.00 ©1450 3.070 -0.177 0.002 1615 3.249 -0.109 2 100 3.249 0109 1.615 ° is 1 Suislaw River, Oregon clay +silt , Florence, Oregon 03 27 -242 -12.8 0.060 
0.05 0.030 0.376 0.17 0.015 0.196 0.560 -0.101 2 100 0.560. 0.101 0.196 ° is 1 63 sample 2367, Antarctic claystone, Plio-Pleistocene, Transantarctic Mt, 85°S 0.01 18 = -25.3. -13.4 0.063 
0.05 0.030 0.405, 0.16 0,019 0.196 0.588 0.116 5 100 0.588 0.116 0.196 ° is i 63 sample 2367, Antarctic claystone, Plio-Pleistocene, Transantarctic Mt, 85°S. 0.01 9 -27.7 -14.6 0.069 
0.01 9.775 19.053 -0.333 0.005 9.893 19.057 0.217 3 19.057 0.217 9.893 ° is * 64 F108534_A, Late Miocene, Nevada, non-marine diatomite 0.99 8 -22.5  -11.9 0.056 
0.018 15.533 30.112 -0.441 0.008 15.580 29.852 0.257 3 29.852 0.257 15.580 ° is 1 38 locality UO011095 Cabrillo Beach Museum parking lot, California: Valmonte Diatomite: middle Miocene 0.99 3 135 -7.1 0.034 
0.02 6513 12.706 -0.228 0.012 6.657 12.809 0.138 4 12.809 0.138 6.657 ° is 1 39 R3888, kaolinite of lateritic profile, Long Reef Beach, near Deewhy, NSW, Australia: middle Miocene 0.01 14 -14.1 -7.4 0,035 
0.035 3.634 7.150 -0.159 0,021 3.793 7.308 0.084 3 7.308 0.084 3.793 ° is 1 John Day fossil beds, Oregon, kaolinite. Oligocene 0.01 46 -12.6 6.7 0,032 
0.035 4.836 9.452 -0.178 0.013 4,990 9.591 -0.098 3 9.591 0.098 4.990 ° is 1 John Day fossil beds, Oregon, smectite, Oligocene 0.3 48 -132 69 0.033 
0.045 4.647 9.084 -0.172 0.007 4.802 9.227 0.093 3 74.0 11.640 0.117 6.058 o cs J 40 Tyee Fm, Triangle Lake area, Oregon Coastal Ranges 0.3 360 -13.6 -7.2 0.034 
0.051 9.739 18.709 -0.186 0.017 9.839 18.720 0.091 4 100 18.720 -0.091 9.839 d is 1 34 Russkaya Polyana Fm, claystone, W Siberia, Omsk Region, well 10, EECO excursion layer, depth 195m 0.3 85 18 09 0.004 
0,052, 10.060 19.391 -0.227 0.016 10.157 19.389 -0.129 3 100 19,389 0.129 10.157 d is 1 34 Russkaya Polyana Fm, claystone, W Siberia, Omsk Region, well 10, EECO excursion layer, depth 198m 0.3 42 4.5 2.4 0.011 
0.055 7.689 14.798 -0.161 0.027 7.804 14.873 0.083 4 100 14.873 0.083 7.807 d is 1 34 Lulinvor Formation, W Siberia, Omsk region, well 10, PETM excursion claystone layer depth 236.85 03 67 -2.9 -15 0.007 
0,056 8.314 15.930 -0.137 0.016 8.375 15.988 -0.107 3 100 15,988 0.107 8.375 d is x 34 Lulinvor Formation, W Siberia, Omsk region, well 10, PETM excursion claystone layer depth 236.9 0.3 45 5.4 -2.8 0.013 
0.52 7.908 15.312 -0.215 0,007 8.042 15.380 -0.117 3 23.759 0.163 12.441 ° cs 2 41 Poleta Formation, Poleta Folds, Eastern California, USA 0.235 0.3 24 4.2 -2.2 0.010 
0.75 9.022 17.767 -0.403 0.005 9.148 = 17.795 0.293 3 85.4 22.247 0.385 11.417 o cs 12 42 Cerro Espuelitas Formation, Arroyo Del Soldado Group, Uruguay, numeric age tentative 0.318 03 no roots 
08 7.400 14.385 -0.231 0,014 7.538 14.466 -0.1359 3 75.0 21,932 0.202 11,433 d cs 16 43,44,14 Wynniatt Fm, Shaler Supergroup, Victoria Island, N Canada, GNME drill core 07-04, sec, carbonate present 0.184 0.3 1 12.6 6.7 0.032 
0.75 9.937 19.140 -0.217 0.024 10.053 19.143 -0.102 2 71.3 35.504 -0.145 18,690 d cs 13 14,44 Mwashya Subgroup, Upper Roan Group, Katanga Supergroup, Zambia, Drillhole RCB1, below glacials 0147 (203 0« «12-49-26 0.012 
0.98 5.589 10.748 -0.113 0.014 5.739 10.874 -0.030 2 14.748 0.010 7.834 ° cs 8 45,46,14 Upper part of the Ui Group, Lakhanda Series, Uchur-Maya Region, Siberia, Russia 0.190 0.3 360 --12.7) 6.7 0.032 
1,025 8.852 17.021 -0.177 0,012 8.979 17.061 0.072 2 27.122 0.074 14.314 ° cs 5 45,46,14 Neryuen Formation, Lakhanda Series, Uchur-Maya Region, Siberia, Russia 0.190 0.3 23 1.2 -0.6 0.003 
1.13 7.086 13.684 -0.173 0.009 7.227 13.775 0.081 2 20.550 -0.092 10.810 o cs 12 14,37 Bylot Supergroup, northern Baffin and Bylot Islands, Canada, secondary carbonate present 0.133 0.3 14 = -10.0 5.3 0.025 
1.25 7.774 15.058 -0.214 0.020 7.910 15.129 -0.116 8 74.0 23.936 0.167 12.532 ° cs 18 — 47,45,14 Totta, Talyn, Trekhgornaya and Dim formations, Uchur-Maya Region, Yudoma-Maya trough, Siberia, Russia 0.175 0.3 23 45 -2.4 0.011 
1.47 8.260 15.940 -0.197 0,022 8.391 15.998 0.096 2 81.2 21.420 0.111 11,252 d cs 4 14 Newland Formation, Lower Belt Supergroup, Montana, USA, drill core SC-93 0.218 0.11 Ss 9.3 49 0,023 
1.47 7.441 14.398 -0.198 0.0048 7.579 14.479 0.102 4 78.7 20.039 0.127 10.504 d is 1 14 $C93-1906, Newland Fm, Lower Belt Supergroup, Montana, drill core SC-93, depth 1906, studied by XRD 0.226 0.11 39 -2.3 -12 0.006 
1.83 5.148 9.988 -0.151 0.021 5.300 10.122 0.070 2 79.5 12.292 0.070 6.451 d cs Bh 48,14 Rove Formation, near Thunder Bay, Ontario, Canada, drill core 89-MC-1 0.166 0.3 96 3.2 -17 0.008 
2.0 5.905 11.517 -0.205 0,012 6.053 11.634 -0.119 2 80.8 14.523 0,149 7.556 d cs 10 49,14 Maraloou Formation, southeastern Capricorn Orogen, Australia, drill core KDD1 0.101 03 19 -148 -7.8 0,037 
2.06 6.636 13.098 -0.312 0.009 6.780 13.196 0.221 4 64.0 29,129 0.609 14.844 d cs 4 51 Zaonega Formation, Lower Ludikovian Series, Karelia, Russia, drill cores 175 and 5190 0.107 0.3 no roots 
2.15 7.152 13.934 -0.240 0.006 7.292 = 14.022 0.146 3 77.9 19.583 0.206 10.182 d cs 20 51,14 Sengoma Argillite Formation, Pretoria Group, Lobatse, Botswana, drill core STRAT 2 0.177 0.3 10 -12.1 6.4 0.030 
2.30 4.830 9.453 -0.185 0,018 4.984 9,592 0.105 4 79.3 11.424 0,130 5.931 d cs 10 =~ 51,52,14 Timeball Hill Fm, Lower Pretoria Group, Transvaal Basin, S Africa, drill cores EBA-1 and 2, above all glacials 0.237 0.3 26 «-16.0 -85 0,040 
2.22 4.977 9.668 -0.152 0.0065 5.130 9.805 0.072 5 63.9 17.128 -0.077 9.010 d is 1 14 EBA-1 569.2 drillhole EBA-1, individual sample from depth 569.2, studied by XRD 0.86 132 08 0.4 -0.002 
2.32 4.910 9.601 -0.184 0.015 5.063 9.738 0.103 2 81.5 11.352 0.122 5.900 d is 4 14 EBA-1 1093 m, Lower Timeball Hill Fm, S Africa; immediately above Huronian glacials, studied by XRD 0.184 0.41 35) (15.5 -B.2 0.039 
2.32 5.633 11.059 -0.234 0,023 5.783 11.182 0.149 2 79.9 13,984 0,202 7.216 d is 1 14 EBA-1 1151.7m, Lower Timeball Hill Fm, S Africa; immediately above Huronian glacials, studied by XRD. 0.188 035 7 -19.4 -10.2 0,049 
2.32 4,203 8.264 -0.181 0.002 4.359 8.413 0.104 2 76.4 9.675 0.134 4,998 d is 1 14,52 H-68-3, 3410m, Eplett drillhole, Gowganda Fm, ON, Canada, above all Huronian glacials, studied for XRD 0.142 0.395 23 --19.7 -10.4 0.049 
24 4.535 8.850 -0.160 0.003 4.690 8.994 0.082 2 72.8 11.370 -0.095 5.936 d is 1 14,12  Drillcore A-77-10, 441 m, Pecors Fm (379-678'), L Huronian Supergroup, ON, Canada, studied by XRD 0.137 0.203 «52 9.6 5.1 0.024 
24 4.469 8720 -0.157 0,015 4.625 8866 -0.079 4 745 10.809 0,088 5.646 d is 1 14 Drillcore 150-4, 4912 m, Pecors Fm, L Huronian Supergroup, Canada, above the oldest Huronian glacial 0.3 63 9.4 5.0 0.024 
24 4.468 8.712 -0.154 0.018 4.624 8.858 0.075 14 74.5 10.792 -0.081 5.644 d is 1 14 Drillcore 150-4, Pecors Fm, Lower Huronian Supergroup, Canada, above the oldest Huronian glacial 0.3 ee -8.0 -4.2 0,020 
2.44 4.705 9.179 -0.163 0.020 4.860 9.321 0.085 8 81.6 10.678 -0.094 5.570 d is i 14,12  Drillcore DDO4, 98.8m, Turee Creek Fm, Hamersley, W Australia, above oldest glacial, studied by XRD 0.151 0.23 53) -10.5 5.5 0.026 
2.44 4393 8603 -0.171 0,001 4.549 8.750 -0.093 2 73.4 10.739 0.120 5.577 d cs 10 14,12 Pecors and McKim Fm, L Huronian, Canada, drill cores A-77-10, A-77-9 ,143-4,150-4, below all glacials 0.120 03 32-153 --8.1 = 0.038 
25 6.250 12.024 -0.129 0,003 6.396 12.136 0.042 2 17.271 0.014 9.148 d cs 12 14,53 Mt McRae Shale, Hamersley Basin, W Australia, drillcores WLT-02, WLT-10, secondary carbonate present 0.209 0.3 51 49 -2.6 0,012 
2.75 4819 9.300 -0.115 0.018 4,973 9.441 -0.036 2 844 10.553 -0.020 5.578 d cs 16 14,53 Manjeri Fm, Bubi and Gweru Greenstone Belts, Zimbabwe, drillcores 690B92-02,696C92-01,711B93-02 0.191 0.3 134 = -1.0 0.5 0.003 
2.95 3.960 7.757 -0.156 0,034 4.117 7.911 -0.079 5 826 8.397 0,084 4,370 d cs 3 55 Monzaan Group, Pongola Supergroup, South Africa, drillcore 0.143 03 32-178 «-9.4 (0.044 
2.95 3.860 7.491 -0.114 0.011 4.018 7.646 0.038 4 82.7 7.988 -0.021 4.217 ° is 1 14,56,57 DRB-3 , W Rand Group, Government Subgroup, Witwatersrand Supergroup, S Africa, sample from mine 0.190 0.3 1130 «-5.7 3.0 0.014 
3.2 5.376 10.393 -0.138 0.014 5.527 10.523 0.055 3 69.7 15.943 -0.033 8.425 d cs 13 14,58 Clutha Fm, Moodies Group, Barberton Greenstone Belt, S Africa, short drillcles, few % secondary silicification 0.172 0.3 1360 4.5 24 0.011 
3.20 5.643 10.955 -0.156 0.015 5.793 11.079 0.085 3 65.3 20,306 0.119 10.653 d is x 14,58  Drillcore 28-148, 43.3m, Clutha Fm, Moodies Gp, Barberton Greenstone Belt, S Africa, some sec. silicification 0.158 03 28 6.8 -3.6 0.017 
3.40 4.640 9.007 -0.138 0.011 4.795 9.150 0.059 3 76.4 11.072 -0.049 5.825, o is % 14,59 KS-1, Kromberg Fm Onverwacht Group, Barberton Greenstone Belt, South Africa, minor sec. silicification 0.253 0.3 76 7A 3.8 0.018 
35 6.737 12.952 44 0.006 6.880 13.052 0.044 2 73.3 19.987 -0.015 10.588 ° cs 7 14,60 Daitari Greenstone Belt, India, uncertain age, probably 3.5 Ga, some secondary silicification 0.168 0.3 58 1.1 -0.6 0.003 
3.75 5.779 11.117 .119 0.006 5.928 11.239 0.035 3 15.479 0.001 8.212 oO is i 61 sample R5310 , chloritized siltstone, surface outcrop, Isua greenstone belt, Greenland 0.01 79 0.6 03 0.002 
3.80 3.172 6150 -0.091 0,014 3.333, 6.315, 0.017 3 65.3 4.765 0,102 2.630 ° er 62 Chinese granite -mylonitized, oldest rock in China 3.8Ga 
mantle 2.545 5.003 -0.121 0.034 2.677 5.140 -0.049 SCO average (n=12) 
mantle 2.681 5.153 -0.053 0.007 - : SCO analyzed by Pack et al., 2016 Rapid Comm Mass Spectrometry, 30, 1495. 
mantle 2.690 5.280 -0.101 0.015 ‘SCO analyzed Pack and Herwartz, 2014 , EPSL 390, 138. 
Details of the shale samples are taken from refs 12143664, 


*Normalized to the mantle value, —0.05% A!0. 
iData from ref. 8. 

#CIA is calculated using X-ray fluorescence (XRF) analysis of rocks where available. A generically average CIA of 75 is assumed for samples without chemical analysis or for which inductively coupled 
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Extended Data Table 2 | Quantitative XRD data for a selection of studied shales 


1 oy * | al 9 17, j 
pushed re ae Lee PK fate] | oe [roa | ome | 
‘0 ‘0 


SC-93_1906 1.47 64.00 78.7 14.30 14.40 -0.198 106+ 29 88.4 + 99.0 
EBA-1_569.2 2.22 86.30 63.9 9.50 9.67 -0.152 70.14 43 11 14.44 1.2 95.5 
EBA-1_1093 2.32 61.00 81.5 10.10 9.60 -0.184 38.54 2.5 55.9 + 94.4 Amphibole (5.7%) 
EBA-1_1151.7 2.32 66.90 79.9 11.74 11.06 -0.234 32.54 1.5 60.6 + 93.1 Amphibole (5.7%) 
H-68-3_3410 2.32 60.10 76.4 8.34 8.26 -0.181 38.3 + 1.2 58.7 3+ 0.4 100.0 
A-77-10_441 2.40 59.00 72.8 9.47 8.85 -0.160 203+ 2.3 74 5.0 + 1.1 99.3 
DD04_98.8 2.46 61.50 81.6 8.52 9.18 -0.163 222+ 1.0 74.3 3.3 + 0.3 99.8 
89-MC-1_795 1.85 61.50 70.5 14.50 23.8+ 1.3 62.9 10.3 + 1.0 97.0 
EBA-1_574.7 2.32 66.00 71.3 7.79 406+ 2.8 49.3 10.1+ 6 100.0 
EBA-1_803.7 2.32 63.10 77.1 7.37 38.8 + 1 612 + 100.0 
DD04_112.5 2.47 58.80 82.0 9.22 23.8+ 0.6 76 03+ 0 100.1 


*Run as COz for §!80 only; previously published*4. 
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Extended Data Table 3 | New XRF analyses of samples studied here 


Barberton 
W Rand Greenstone 
Group, 


Mwashya 
Daitari Subgroup, 
Greenstone Upper Roan 


= Group, 
Belt, | Bylot 
etencls Supergroup] Kromberg ylo Group 


S Africa Formation, coe eae 
South Africa 
mean] ow [ww [ww [ww [wo [ww | w | 
core/outcrop ° ° ° mine ° ° ° c Cc c c c Cc e 
Age 3.5 3.5 3.5 2.95 3.40 1.27 1.27 1.13 0.75 0.75 0.75 0.75 0.75 0.75 
n 1 1 1 1 1 1 1 1 3 1 1 1 1 1 4 
si02 92.65 96.72 74.54 72.40 75.19 61.76 67.41 74.74 67.97 61.58 64.90 59.81 61.17 59.78 61.45 
Tio2 0.16 0.08 0.74 0.46 0.75 0.63 1.47 0.67 0.92 1.69 1.21 0.83 1.10 1.51 1.27 
Al203 5.17 1.77 16.80 11.43 16.96 12.77 18.38 15.33 15.50 16.76 17.81 19.15 11.50 14.74 15.99 
Fe203 0.23 0.42 2.43 9.96 1.84 4.47 3.42 4.58 4.15 10.14 7.62 14.20 17.54 8.67 11.63 
MnO 0.00 0.05 0.00 0.10 0.00 0.02 0.05 0.04 3.52 0.09 0.06 0.10 0.08 0.05 2.59 
MgO 0.14 0.19 0.32 3.53 0.28 7.89 1.63 1.03 0.04 2.30 2.02 2.75 2.84 3.04 0.08 
CaO 0.05 0.04 0.00 0.29 0.11 5.47* 0.40 0.09 0.24 1.15 0.45 0.39 1.33 5.03" 0.83 
Na2O 0.00 0.10 0.00 0.00 0.00 1.72 3.08 0.05 1.62 2.70 2.18 0.26 1.85 3.16 2.03 
K20 1.53 0.54 5.00 1.61 4.65 4.84 3.89 3.25 3.99 3.24 3.43 2.27 2.33 3.57 2.97 
P205 0.02 0.02 0.04 0.03 0.04 0.19 0.07 0.03 0.09 0.13 0.10 0.09 0.11 0.21 0.13 
Total 99.96 99.94 99.87 99.81 99.82 94.29 99.79 99.80 97.96 99.78 99.78 99.85 99.85 94.75 98.80 
CIA 74.74 69.60 75.60 82.70 76.60 41.48 65.14 81.62 62.75 63.16 69.40 85.18 59.83 44.7 64.46 
ppm 
Rb 38 24 140 66 116 126 156 134 139 181 126 120 150 159 147 
Sr 18 3 rd 140 89 32 87 164 87 4 98 265, 131 
Ba 35 345, 273 298 343, 501 279 374 375 541 229 188 403 347 
Zr 51 Tf 168 101 177 146 230 588 321 300 247 142 181 247 223 
¥ urd 12 13 14 25 19 30 33 10 22 29 25 
Nb 9 6 13 10 an 14 42 18 24 52 44 22 30 37 37 
Cs 4 27 18 4 
Sc 3 3 14 8 1 14 ai 9 15 27 21 16 8 23 19 
v 21 15 94 88 137 464 179 70 237 156 142 109 99 166 134 
Cr 46 247 78 416 206 67 122 55 81 114 93 98 46 76 85 
Ni 10 ut 18 276 77 52 12 24 29 24 20 36 26 26 
Cu 5 42 94 56 9 33 2 33 2 17 14 
Zn 34 ani vas 153 319 87 101 169 123 172 119 168 37 124 
Ga 4 AT 13 20 15 20 16 17 24 22 20 15 21 21 
La 25 25 32 23 27 44 At 37 70 51 61 51 66 60 
Ce 59 22 39 42 18 67 120 78 88 96 106 106 122 119 110 
Pr 8 1 5 8 4 14 14 8 12 abl 12 14 12 si! 12 
Nd 42 ani 6 1 37 27 25 27 37 37 28 37 33 
Hf 2 
Pb 1 18 5 31 18 13 1 7 
Th 4 1 14 re 3 10 15 9 
Ta 4 
EBA-1 EBA-1 and |A77 and 150) 4143/4 
Ui Fm UI ae if ie a Fe ea ee ey Mersin 28166 | Barbert 
i Fm Upper} imebal rillhole, ‘ormation reel arberton 
part of he SC-93 aMG KDD DERGIAK, |, STRAP? | aa Fm, | Gowganda| (14'-292'), | Group, Formation | wit mt Clutha  |Greenstone 
UiGroup, | Newland |. Rove Maraloo | 220nega | Sengoma |. ae, ciist| Formation,| Lower | Hamersley| 16 | Mcrae | FYG Roy __|Formation,| Belt, 
Lakhanda | Formation, Fonmenon, Formation, Fonmanon: Arallite one, Huronian | Huronian | Province, 2590): Shale, rl Shale: Dominion Moodies |Onverwacht PM, Dattatt 
Sample | Series, | Lower Belt | _,"°2" SE lower’ | Formation, | s siohefstt| Supergroup| Supergroup| Western | Le |emersiey| “an | Fee. | Gow | Group, |oeenstone 
Uchur-Maya|Supergroup] MUPE*” | Gaoricomn | Ludlkovian | Pretoria oom , Cobalt ON, Australia, | [Urentan | province, |. Fm South | Barberton | Kromberg | Belt India, 
ie Bay, Series, Series, m Supergroup} Western Africa 7 slate 
Region, Montana, Ontario. Orogen, Karelia idbates Synclinoriu] area ON, Canada, | above the ON Western Australia Greenstone] Formation, 
Siberia, USA : Australia "| m, South | Canada, above the oldest i! Australia Belt, South] South 
Russia Canada Russia] Botswana |“ atica; | above all | oldest | Huronian | Canada. Africa | Africa 
above all | Huronian | Huronian | glacial | Pelow all 
, : glacials 
glacials glacials glacial 
core/outcrop ° c c c c c c c c c c c c c c ° ) 
Age 0.98 1.47 1.85 1.87 2.05 2.15 2.22 2.32 24 2.46 2.47 2.5 2.66 2.95 3.2 3.4 3.5 
n 5 2 2 3 4 3 3 3 ug 3 4 4 4 1 4 1 3 
si02 59.96 63.7 56.2 64.34 64.13 66.12 70.53 64.54 58.22 60.05 58.26 46.18 64.29 72.40 60.83 75.19 74.54 
Ti02 1.04 0.8 0.9 0.87 0.68 0.75 0.59 0.73 0.86 0.62 0.87 0.43 0.67 0.46 0.56 0.75 0.33 
Al203 15.66 20.8 22.6 18.55 16.88 17.10 16.29 20.80 24.13 18.09 23.70 12.77 15.76 11.43 15.01 16.96 7.92 
Fe203 9.96 8.2 7.0 7.21 4.20 6.11 6.07 6.71 5.67 9.90 6.20 4.39 6.10 9.96 7.84 1.84 1.03 
MgO 2.45 1.3 2.8 3.74 2.40 2.84 1.09 1.64 2.31 7.36 2.09 10.29 3.40 3.53 5.08 0.28 0.22 
MnO 0.05 0.0 0.0 0.63 0.05 0.02 0.02 0.20 0.04 0.04 0.04 0.41 0.08 0.10 0.17 0.00 0.02 
CaO 0.41 0.1 0.4 0.41 2.77 0.97 0.70 0.59 0.25 0.16 0.22 3.56 1.40 0.29 0.62 0.11 0.03 
K20 1.88 47 5.4 3.03 1.08 3.60 3.56 3.77 7.52 3.46 7.58 7.67 7.11 0.00 5.44 0.00 0.03 
Na20 4.39 0.2 0.8 0.86 7.50 2.23 0.93 0.70 0.61 0.05 0.63 0.16 0.22 1.61 1.67 4.65 2.36 
P205 0.46 0.1 01 0.10 0.09 0.11 0.09 0.14 0.11 0.12 0.12 0.08 0.08 0.03 0.10 0.04 0.03 
Total 99.40 99.8 96.2 99.74 99.79 99.86 99.86 99.82 99.72 99.84 99.69 85.93 98.76 99.81 99.37 99.82 99.92 
Lol 0.2 3.8 0.26 0.22 0.14 0.14 0.18 0.28 0.16 0.31 0.00 1.24 0.63 
CIA 74.80 78.8 74.3 76.49 56.60 65.45 70.48 77.07 71.95 82.22 70.82 60.72 82.70 62.70 76.60 73.31 
ppm 
Rb 193 129 225 148 37 184 183 181 329 140 320 224 151 66 142 116 67 
Sr 29 71 87 36 202 88 84 89 44 Q 36 60 32 3 247 ts 18 
Ba 563 337 1182 1805 662 857 422 626 1302 412 1151 377 406 273 712 298 190 
Zr 246 230 159 137 123 164 176 157 146 138 156 108 247 101 121 177 78 
Y 29 36 37 20 32 30 30 28 19 26 23 53 12 21 13 17 
Nb 18 17 16 16 1 14 15 17 12 10 13 12 23 10 10 1 9 
Sc 20 12 19 25 14 17 1 15 18 20 19 24 15 8 14 1 7 
Vv 114 222 196 88 233 102 137 168 123 167 75 109 88 107 137 43 
Cr 92 105 168 179 126 140 96 147 164 151 159 98 188 416 658 206 123 
Ni 53 37 181 7 117 4 64 72 76 89 84 118 276 291 77 35 
Cu 14 33 58 10 94 32 47 22 3 63 24 121 42 31 94 
Zn 49 163 187 55 170 94 102 64 70 56 355, 96 77 84 153 22 
Ga 25 31 24 20 24 20 26 31 24 30 17 25 13 19 20 1 
La 54 42 53 95 55 34 53 51 43 52 43 39 54 32 35 23 25 
Ce 89 88 108 200 119 74 116 110 91 93 88 85 119 42 68 18 40 
Pr 9 lt 21 13 9 13 13 9g 1 9 an W 8 v4 4 5 
Nd 37 4 40 62 39 23 52 45 37 43 39 26 51 6 23 26 
Hf 7 rs 7 5 4 5 6 6 5 5 5 4 6 3 2 
Pb 8 14 6 16 18 75 15 5 15 20 14 ‘Al 16 18 
Th 16 16 23 20 10 22 19 22 18 16 22 17 14 4 4 
Ta 6.15 


bdl, below detection limit; LOI, loss on ignition; p.p.m., parts per million. 
*Secondary carbonate present. 
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Large potential reduction in economic damages 
under UN mitigation targets 


Marshall Burke!?**, W. Matthew Davis? & Noah S. Diffenbaugh!4 


International climate change agreements typically specify global 
warming thresholds as policy targets', but the relative economic 
benefits of achieving these temperature targets remain poorly 
understood”*. Uncertainties include the spatial pattern of 
temperature change, how global and regional economic output 
will respond to these changes in temperature, and the willingness of 
societies to trade present for future consumption. Here we combine 
historical evidence‘ with national-level climate and socioeconomic® 
projections to quantify the economic damages associated with the 
United Nations (UN) targets of 1.5°C and 2°C global warming, 
and those associated with current UN national-level mitigation 
commitments (which together approach 3°C warming’). We find 
that by the end of this century, there is a more than 75% chance 
that limiting warming to 1.5°C would reduce economic damages 
relative to 2°C, and a more than 60% chance that the accumulated 
global benefits will exceed US$20 trillion under a 3% discount 
rate (2010 US dollars). We also estimate that 71% of countries— 
representing 90% of the global population—have a more than 75% 
chance of experiencing reduced economic damages at 1.5°C, with 
poorer countries benefiting most. Our results could understate the 
benefits of limiting warming to 1.5°C if unprecedented extreme 
outcomes, such as large-scale sea level rise®, occur for warming of 
2°C but not for warming of 1.5°C. Inclusion of other unquantified 
sources of uncertainty, such as uncertainty in secular growth rates 
beyond that contained in existing socioeconomic scenarios, could 
also result in less precise impact estimates. We find considerably 
greater reductions in global economic output beyond 2°C. Relative 
to a world that did not warm beyond 2000-2010 levels, we project 
15%-25% reductions in per capita output by 2100 for the 2.5-3°C 
of global warming implied by current national commitments’, and 
reductions of more than 30% for 4°C warming. Our results therefore 
suggest that achieving the 1.5 °C target is likely to reduce aggregate 
damages and lessen global inequality, and that failing to meet the 
2°C target is likely to increase economic damages substantially. 

Anticipating the potential impacts of climate change is central to 
planning appropriate policy responses, including how to allocate 
resources among mitigation and adaptation options. By committing 
the international community to holding global warming to “well below 
2°C above pre-industrial levels” and pursuing a 1.5°C target’, the UN 
Paris Agreement increased the need for quantitative analysis of uncer- 
tainties in the costs and benefits of achieving highly resolved warming 
targets. In particular, because mitigation costs are thought to rise rap- 
idly for more stringent targets’, understanding the value of avoided 
impacts (what we term ‘benefits’) is central to evaluating the 1.5°C 
target. Quantification of these potential benefits and their uncertain- 
ties is needed at the aggregate global level to guide coordinated global 
policy, as well as at a more local level to understand the distributional 
impacts of global policy choices’. Further, because the current national 
commitments imply warming’ of 2.5-3°C, quantifying the impact of 
exceeding the 1.5°C and 2°C targets is also critical to understanding 
the implications of policy choices. 


Here we estimate the global and country-specific economic impacts 
of limiting warming to 1.5°C relative to 2 °C, as well as the global 
impacts of projected warming under current mitigation commitments, 
separate from any mitigation costs incurred in achieving those targets. 
We measure potential global and country-level damages using gross 
domestic product (GDP), the total value of goods and services 
produced in a country in a given year. GDP is clearly an incomplete 
summary of the benefits of mitigation, and it cannot easily diagnose 
many sector-specific impacts (for example, in crop agriculture versus 
manufacturing). However, it does capture how sector-specific impacts 
interact and aggregate—a traditional challenge for sector-specific 
empirical work and model-based approaches to aggregation'!. GDP 
also remains highly relevant to policy discussions, and the level and 
uncertainty in GDP impacts associated with the UN temperature 
targets has not been formally quantified. 

We construct a probabilistic framework (Fig. 1) that incorporates 
uncertainty in (1) the historical relationship between temperature 
variability and economic growth, (2) the spatial pattern of future 
mean annual temperature change associated with a given level of 
aggregate emissions, (3) the future rate and pattern of economic devel- 
opment absent climate change, and (4) how future damages should be 
discounted. 

To estimate the historical relationship between temperature and 
GDP, we use annual measurements of average temperature and growth 
in GDP per capita from 165 countries over the years 1960-2010. 
Following Burke et al.4, we use a fixed-effects estimator that isolates 
the effect of temperature fluctuations from other time-invariant and 
time-varying factors that might be correlated with both temperature 
and economic output, and we estimate nonlinear response functions 
that allow the marginal effect of warming to differ as a function of coun- 
tries average temperatures. To quantify uncertainty in this historical 
relationship, we employ multiple bootstrapping approaches, estimating 
a separate response function for each re-sample (see Methods). 

All estimated response functions relating GDP growth to tempera- 
ture display a similar concave shape (Fig. la), suggesting that additional 
warming accelerates growth in cooler regions and slows growth in 
warmer regions. These findings are consistent with a large body of work 
demonstrating nonlinear responses of economic outcomes to changes 
in temperature!*~'”, However, there is uncertainty in the temperature 
at which additional warming begins to generate damages rather than 
benefits (the ‘temperature optimum), with a median estimate of 13.1°C 
but a 5%-95% range of 9.7-16.8°C. Because much of today’s GDP is 
produced in areas just beyond the median estimated optimal tempera- 
ture (density plot, bottom of Fig. 1a), uncertainty in this optimum leads 
to substantial overall uncertainty in both the magnitude and sign of the 
impact of additional warming. 

We project impacts under different levels of future warming by com- 
bining these historical response functions with the Intergovernmental 
Panel on Climate Change (IPCC) projections of future climate'®. The 
climate model experiments used by the IPCC involve dozens of general 
circulation models (GCMs) run under four forcing pathways (called 
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Annual average temperature (°C) 
Fig. 1 | Deriving impact projections. a, Historical response of per 
capita GDP growth rates to temperature. Each curve is the response 
function estimated from one of 1,000 bootstraps of a historical regression 
with colour corresponding to the temperature at which it optimizes 
(redder colours for cooler optima). The green, brown and purple dashed 
curves highlight bootstraps at the 10th, 50th and 90th percentiles of 
optimizing temperatures, respectively. The rug plot at the bottom shows 
the distribution of optimizing temperatures across bootstraps using the 
same colour scheme. The density plot in black shows the GDP-weighted 
distribution of baseline average national temperatures. b-d, Projected 
future economic pathways under different historical response functions. 
Black lines represent the pathway of global GDP per capita, assuming 
no future warming. Coloured lines are pathways corresponding to the 
response functions at the 10th, 50th and 90th percentiles highlighted 
in a, under warming projections from 32 GCMs consistent with RCP2.6. 
Points represent values projected for 2099. e-g, Projected climate impact 


representative concentration pathways, or RCPs). Each GCM realization 
contains a temperature trajectory for each country and, in aggregate, 
for the globe. Because temperature affects both the level and the growth 
rate of economic output*"’, and because growth effects compound 
over time, the projected differential impacts of 1.5°C versus 2°C are a 
function of the time horizon. We calculate differential impacts under 
the two targets using temperature changes for the mid-century (2046- 
2065) and end-of-century (2081-2100) periods used by the IPCC, 
focusing on output from those RCPs whose ensemble range spans 
1.5°C and 2°C for a given time period (Methods). We use projections 
from the relevant shared socioeconomic pathways (SSPs) to define 
the secular evolution of population and economic development”, 
(Fig. 1b-d, Extended Data Fig. 2). 

Economic impacts are calculated relative to a constant-temperature 
counterfactual and are then aggregated globally (weighting by popu- 
lation), resulting in a unique estimate of global impact for each boot- 
strap-GCM-SSP-year combination. We present two measures of these 
relative impacts: the percentage difference in annual GDP at the end 
of the chosen projection period and the discounted present value of 
absolute GDP differences accumulated over that span. For the second 
measure we employ a range of discounting schemes, including fixed 
rates of 2.5%-5% per annum (where a 5% discount rate assumes that 
society values a given amount of consumption in one year roughly 5% 
less than it values it today) and time-varying rates that depend on the 
levels of and uncertainty in realized growth (Methods). 

We estimate the benefits of 1.5°C versus 2 °C by fitting a linear least- 
squares regression relating either measure of relative economic impact 
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Year Global warming (°C) 


Global warming (°C) 


on global GDP per capita by 2099 for the same response functions, 
equivalent to the percentage difference between the black points and 
coloured points in b-d. The warming on the x axes is the global warming 
projected for 2099 by GCMs running RCP2.6, relative to a pre-industrial 
benchmark. Red vertical dashed lines mark 1.5°C and 2.0°C warming. 
Linear ordinary least-squares models are fitted for each of the response 
functions, with the slope estimating the per-degree impact of global 
warming on global GDP per capita. Shaded areas represent the 95% 
confidence interval of the ordinary least-squares fit. i, The linear fits from 
e-g, but for all bootstrapped response functions instead of just the three 
highlighted in b-g. The colours correspond to the optimizing temperatures 
of the response functions, as in a. The rug plot at the bottom marks global 
warming for the end of the century (2099) projected by the 32 GCMs 
consistent with RCP2.6, equivalent to the x-axis values of points in 

e-g. h, Equivalent to i but for mid-century (2049) projections based on 42 
GCMs consistent with RCP4.5. 


to the global warming projected by each GCM that archives the RCP 
(Fig. le-g). We repeat this procedure for every bootstrapped response 
function to arrive at a distribution of estimated impacts for the chosen 
combination of GCM, SSP and projection period. See Methods for a 
full derivation. 

Most response functions generate more negative global impacts at 
2°C than at 1.5°C (Fig. 1h-i, Extended Data Fig. 2). Cooler estimated 
historical optima (red colours) generate steeper negative responses to 
additional warming, implying greater benefits from more stringent 
mitigation. We estimate that limiting warming to 1.5°C instead of 2°C 
by mid-century would lead to an increase in global GDP of 1.5%-2.0% 
(median estimate; Fig. 2a) and US$7.7-11.1 trillion in discounted 
avoided damages under a 3% fixed annual discount rate. Meeting these 
targets at the end of the century is estimated to lead to median gains 
in global GDP per capita of 3.4% and discounted avoided damages of 
US$36.4 trillion. 

We use the distributions of bootstrapped estimated impacts to 
quantify the probability that more stringent mitigation yields benefits 
of different magnitudes (Extended Data Table 1). We estimate that 
achieving the 1.5°C target at mid-century (2046-2065) would lead to a 
68%-76% chance of overall cumulative net benefit relative to 2°C under 
a fixed 3% discount rate. Under the same discount rate, we estimate a 
43%-53% chance of discounted cumulative benefits exceeding US$10 
trillion and a 4%-8% chance of exceeding $30 trillion, which is about 
40% of current global GDP. For the end of the century (2081-2100), 
we estimate a >75% chance of net gain in per capita global GDP, an 
approximately 38% chance that benefits exceed US$50 trillion, and 
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Fig. 2 | Global impact of limiting global warming to 1.5°C relative 

to 2°C. a, Probability distribution of the percentage change in global 

GDP per capita for 1.5°C versus 2°C by mid-century and by the end of 
the century, as derived from the slopes of the linear fits across response 
functions illustrated in Fig. 1h-i. Positive values indicate reduced damages 
at 1.5°C of global warming as compared to 2°C. Values above distribution 


an approximately 5% chance that benefits exceed US$100 trillion (3% 
discount rate; Extended Data Table 2). 

While end-of-century estimates of the magnitude of absolute impacts 
are sensitive to choices about discounting (Extended Data Fig. 3, 
Extended Data Table 1), estimates of the probability of positive benefits 


Percentage gain in GDP per capita 
from achieving 1.5 °C versus 2.0 °C 


Percentage 
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Fig. 3 | Country-level impact of limiting global warming to 1.5°C 
relative to 2°C. a, b, Median estimates of impacts on change in GDP per 
capita under 1.5 °C versus 2 °C, for mid-century and the end of the century. 
Positive values indicate reduced damages at 1.5 °C of global warming 
as compared to 2 °C. c, Median estimated impacts as a function of each 
country’s baseline GDP per capita, with each country weighted equally. 
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b Discount rate 
Mid-century, RCP4.5 


Change in cumulative global 
GDP by 2049 (US$ trillions) 


End of century, RCP2.6 


100 


-100 0 
Change in cumulative global 
GDP by 2099 (US$ trillions) 


200 


report percentage changes at the 10th, 50th and 90th percentiles of 
distribution. b, Probability distribution of the change in cumulative global 
GDP by mid-century, assuming discount rates of 2.5% (dotted line), 3% 
(dashed line) and 5% (filled line). c, The equivalent for the end of the 
century. 


are much less so (Extended Data Tables 2 and 3). Results are also 
relatively insensitive to alternative bootstrap resampling approaches, 
to different SSPs, and to alternative assumptions about the time path of 
future warming for a given RCP (Extended Data Figs. 4, 5). Inclusion of 
additional lags of temperature in the historical regression—a common 
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Lines represent local polynomial regression fits to the data with the 
corresponding 95% confidence intervals shaded in grey. d-f, As in a-c, but 
for the probability of per capita GDP gain, calculated as the percentage of 
bootstrap response functions projecting a net gain in a country’s GDP per 
capita under 1.5 °C of global warming as compared to 2 °C. 
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Fig. 4 | The impact of global warming on global GDP per capita, relative 
to a world without warming, for different forcing levels. a, Projected 
percentage change in global GDP for different climate models under 
different RCP forcing scenarios, relative to a no-warming baseline 
(median bootstrap, SSP1). Colours denote different RCPs. Unfilled 
points show mid-century projections, filled points show end-of-century 
projections. Vertical lines show the UN temperature targets as well as 
the range of estimates of end-of-century warming under current Paris 
commitments’. Warming is relative to pre-industrial levels. b, c, Sources 
of uncertainty in estimates of global warming on cumulative global 
GDP loss for a given forcing level. Total uncertainty in the impact of 
warming on global GDP under a given forcing scenario is a combination 
of uncertainty in how economies respond to warming (‘historical 


approach to capturing persistent growth effects''—amplifies the 
effect of temperature on growth rates and results in larger estimates of 
benefits under 1.5°C (Extended Data Fig. 4). Other potential sources 
of uncertainty, such as uncertainty in the secular rate of growth beyond 
the scenarios prescribed by the SSPs, were not quantified and could 
increase overall impact uncertainty. 

At the country level, both the magnitude and the uncertainty of potential 
benefits are highly non-uniform. We find that 71% of countries— 
encompassing about 90% of projected global population—exhibit a 
>75% chance of experiencing positive economic benefits at 1.5°C rela- 
tive to 2°C (Fig. 3), and 59% of countries exhibit a >99% chance. These 
countries include the three largest economies (the USA has a 76% 
chance of positive benefits; China 85%; Japan 81%) (Fig. 3, maps). They 
also include a large fraction of the world’s poorest countries, with the 
likelihood of economic gains rising rapidly at lower levels of GDP per 
capita (Fig. 3c, f). Many of the countries that exhibit a high probability 
of economic benefits from 1.5 °C are concentrated in the tropics and 
sub-tropics, where both current and future temperatures are warmer 
than the economic optimum‘. As a result, even small reductions in 
future warming in these countries can generate substantial increases in 
per capita GDP, with many countries in the tropics exhibiting per capita 
GDP 10%-20% higher at 1.5°C than 2°C by the end of the century 
(Fig. 3a, b, d, e). The opposite is true for a smaller number of high- 
latitude countries, where 1.5 °C is estimated to slow growth and generate 
a high probability of negative impacts relative to 2°C. Achieving the 
1.5°C target will thus have unequal consequences, with today’s poorest 
countries benefiting the most. 

Despite the Paris Agreement’ focus on the 1.5°C and 2°C targets, 
its actual Nationally Determined Contributions (NDCs) are instead 
consistent with 2.5-3 °C of global warming’. We estimate that this level 
of warming could lead to a reduction in global GDP as high as 10% by 
mid-century and 15%-25% by the end of the century (median estimates 
across SSPs; Fig. 4 and Extended Data Fig. 6), relative to a world that 
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Cumulative GDP loss, 
2099 (US$ trillions) 

regression uncertainty’), uncertainty across climate models in the amount 
and pattern of warming for a given level of forcing (‘climate model 
uncertainty’), uncertainty in baseline future growth rates across baseline 
socioeconomic scenarios (‘SSP uncertainty’), and plausible alternatives for 
how to specify the discount rate (‘discount rate uncertainty’). Values show 
cumulative global GDP losses in trillions of US$ for mid-century under 
RCP4.5 (b) and the end of the century under RCP2.6 (c), either with all 
factors allowed to vary (‘total uncertainty’) or with the listed factor allowed 
to vary and all others fixed at their median (see Methods). Each vertical 
line is a point estimate; for example, with 32 climate models running 
RCP2.6 there are 32 estimates shown for ‘climate model uncertainty’ in 
c. Red lines are the median estimate across each uncertainty distribution. 


did not warm beyond 2000-2010 levels. In addition, failing to meet the 
NDC commitments is likely to lead to reductions in global GDP that 
exceed 25% by the end of the century. Uncertainty in these estimates is 
driven much more by uncertainty in economic parameters—namely, 
the economic response to warming and the discount rate—than by 
uncertainty in the pattern and magnitude of temperature change 
reflected in the climate model ensemble (Fig. 4b and c), highlighting 
the importance of better constraining these economic parameters”. 

Because our future impact estimates are based on observed histori- 
cal economic responses to temperature variability, our projections will 
misstate impacts if the relationship between future annual temperatures 
and climatic extremes differs from what has occurred historically, or if 
future societies respond differently from societies in the recent past— 
although there is growing evidence that economic development might 
not fundamentally alter these economy-environment linkages*!>"’”. 
We also cannot account for historically unprecedented changes, such as 
large-scale loss of land ice and associated sea level rise, which are more 
likely to occur®”! at 2°C than 1.5°C and are expected to exacerbate 
impacts?*?, 

To support policy decisions, our estimates of avoided damages need 
to be compared against the costs of meeting the UN targets. To our 
knowledge, no comparable estimates of global abatement costs through 
to the end of the century currently exist. However, a recent estimate” 
suggests that achieving emissions levels in 2030 that are consistent with 
the 1.5°C target will lead to approximately US$300 billion in additional 
(non-discounted) abatement costs relative to emissions consistent with 
2°C. This estimate of abatement costs is >30 times smaller than our 
median estimate of (discounted) mid-century avoided damages. 

Not accounting for abatement costs, our results suggest that 1.5°C 
global warming is “likely”” to result in substantial economic benefits 
relative to 2°C, with foregone damages probably in the tens of trillions of 
dollars and 59% of countries “virtually certain”” to benefit. Given that 
most of these countries feature large populations or high poverty rates 
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or both, our results suggest that achieving more stringent mitigation 
targets will probably generate a net global benefit, with particularly 
large benefits for the poorest populations. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0071-9. 
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METHODS 


Deriving the historical response function. To understand the historical relationship 
between temperature and economic output, we assemble annual data on country- 
level GDP per capita from the World Bank’s World Development Indicators, using 
data on 165 countries over the period 1960 to 2010. Growth is computed as the first 
difference of the natural logarithm of the annual purchasing power parity-adjusted 
per capita GDP series in each country. These data are then merged with tempera- 
ture and precipitation data from the University of Delaware”*. The gridded monthly 
temperature and precipitation data are aggregated temporally to the annual level 
and spatially to the country level. We then follow ref. 4 and estimate a panel fixed 
effects model: 


Alog(y,) = ByT + BT; + AP + \Pi + FY + Ot + Oot + Ej (1) 


where yj is per capita GDP in country iin year t, Tand P are the average temperature and 
precipitation in year t, ju; are country-fixed effects (dummies) that control for time-in- 
variant differences between countries, v; are year-fixed effects that account for common 
global shocks in a given year, and 6) ;t + 02; are country-specific linear and quadratic 
time trends, which allow temperature and growth to evolve flexibly at the country level. 

Equation (1) is estimated simultaneously on our global sample of country-years 
(N=6,584). Point estimates for 3, and (3, are statistically significant in this regres- 
sion (3; =0.0127, standard error 0.0032, P< 0.001; 3; = —0.0005, standard error 
0.0001, P< 0.001). 

Equation (1) assumes that there is a single response function (described by 3; 
and (3) that specifies the overall global relationship between income growth and 
changes in temperature, but that individual countries can respond differently to 
warming as a function of their average temperature (which can be seen by differen- 
tiating equation (1) with respect to temperature). Past work has shown that average 
temperature—rather than other correlated factors such as average income—is the 
main source of heterogeneity in how countries’ income growth responds to changes 
in temperature and that estimates of (3, and (3, are highly robust to alternative 
specifications of the fixed effects and time controls‘. 

An additional concern is that countries trade with one another and that unob- 
served temperature shocks across a trading network might lead to biased coeffi- 
cient estimates in equation (1). However, if temperature shocks are uncorrelated 
across trading partners, then estimates of 3; and (3; still represent unbiased esti- 
mates of own-country temperature shocks on output; if shocks are correlated across 
trading partners, then 3 and (3 represent reduced-form estimates of the net effect 
ina given country of correlated shocks across that country’s trading network. The 
main concern for our analysis is if the future pattern of temperature change should 
not correspond to the spatial pattern of historical shocks; however, we are unaware 
of any relevant research in climate science. 

To quantify uncertainty in estimates of 3, and (42, we implement multiple 
bootstrapping strategies: (1) Sampling by country. From our list of 165 countries, 
draw (with replacement) a 165-element list of countries—which will omit some 
countries and contain duplicates of others—and retain all years of data for the 
selected countries; this is repeated 1,000 times, drawing a new country sample 
each time, re-estimating equation (1), and retaining estimates of 3; and (3. This 
approach allows for arbitrary correlation in residuals within countries over time. 
(2) Sampling by year. This allows for potential cross-sectional correlation in residu- 
als in a given year, and is also repeated 1,000 times. (3) Sampling by five-year block. 
We divide the data into 10 five-year blocks (that is, 1961-65, 1966-70, and so on 
through 2010), and sample with replacement from these 10 blocks. This allows for 
both temporal and cross-sectional dependence in residuals, for example, as caused 
by global recessions that last multiple years. 

Our main results use strategy (1) (sampling by country), but we show that our 
results are robust, regardless of the strategy used. In what follows, the boot- 
strapped response functions h/(T;,) = Bh, + ped are indexed with j, where 
j€(1, 2,... 1,000). 

For each f(T), we define the ‘temperature optimum’ as the maximum of the 
quadratic function, that is, — ay 
yield 8) > Oand Bj <0). 

To ensure that equation (1) is capturing growth effects and not just level 
effects, we re-estimate equation (1) with additional lags of temperature (and their 
squares)*!!. This is important because countries’ economic output could ‘catch up’ 
in the year following a temperature shock; this catch-up behaviour would not be 
captured in a model containing only contemporaneous temperature variables, but 
would be captured in a model that includes lags of temperature and where overall 
temperature effects are computed by summing contemporaneous and lagged coef- 
ficients'!. We thus estimate equation (1) with up to five lags / of temperature, that is,: 


(this is always a maximum because all estimates 


2x ps 


: Bi 6 8 i 
(Ty) = D> 18; Ta + B57 e—2 (2) 
l=0 


and re-estimate all calculations below with results from these distributed lag mod- 
els. Our main results with this sensitivity test are shown in Extended Data Fig. 4. 
Climate model simulations. To follow the IPCC protocols, we analyse the 
exact climate model realizations and time periods used by the IPCC in its most 
recent assessment report>. These climate model realizations were generated by 
the World Climate Research Program under Phase Five of the Climate Model 
Intercomparison Project (CMIP5)!®. For the historical baseline experiment, the 
CMIP5 protocol ran each climate model from the mid-1800s to 2005, using the 
historical climate forcings. For the future scenarios, the CMIP5 protocol used the 
RCPs, which assume different levels of climate forcing going forward in the 21st 
century. In total, there are four: RCP2.6, RCP4.5, RCP6.0 and RCP8.5. 

Following the IPCC protocols, we use the same historical baseline period 
(1986-2005) and RCP future periods (2046-2065 and 2081-2100) as did the 
IPCC. In our bias correction method (see below), there are three RCPs whose 
global warming ranges are most consistent with the 1.5°C and 2°C targets in these 
IPCC scenario time periods: RCP4.5 and RCP6.0 during the 2046-2065 period, 
and RCP2.6 during the 2081-2100 period. (RCP2.6 is the only RCP scenario in 
which some models project global warming of less than 1.5°C for the end of the 
century; for mid-century, none of the RCP2.6 model runs project warming above 
2°C, and so we do not utilize RCP2.6 for mid-century). We therefore calculate the 
distribution of GDP outcomes in response to the global warming levels projected 
during the 2046-2065 period of RCP4.5 and RCP6.0, and during the 2081-2100 
period of RCP2.6. In addition, to compare the probability of economic impacts 
for the UN targets with the probability of those for higher levels of greenhouse gas 
emissions, we also calculate the distribution of GDP outcomes for the 2046-2065 
and 2081-2100 periods of RCP8.5. 

Uncertainty in the temperature-driven GDP impacts of a given level of green- 
house gas emissions arises from both uncertainty in the level of global warming 
associated with that level of emissions and uncertainty in the spatial pattern of 
temperature at that level of global warming. The IPCC climate analysis protocols 
span these uncertainty dimensions by analysing one realization of each climate 
model in each RCP scenario’. To follow the IPCC protocols, we analyse the same 
realizations as the IPCC. 

However, it should be noted that the CMIP5 ensemble does not span the 
full range of each uncertainty dimension in a fully uniform framework. Rather, 
although the experimental conditions for the ensemble were coordinated between 
the modelling centres, both the models and the implementation of the simulation 
conditions vary across the ensemble. For example, the ensemble includes simula- 
tions from all national modelling centres that chose to participate, but not every 
modelling centre archived a simulation in each scenario. As a result, the IPCC 
selection of one realization of each model in each RCP yields different numbers of 
realizations—and model combinations—in each RCP (42 realizations in RCP4.5, 
32 in RCP2.6, 25 in RCP6.0 and 39 in RCP8.5). Likewise, although each modelling 
centre conformed to a basic set of coordinated experimental conditions, the exact 
implementation of those conditions varied between the centres. This combina- 
tion of coordinated but incomplete experimental uniformity has led the CMIP5 
ensemble to be known as ‘an ensemble of opportunity. As in the IPCC, we leverage 
the CMIP5 ensemble of opportunity to estimate an approximate probability dis- 
tribution; it should be emphasized that this approach is not identical to sampling 
across a probabilistic ensemble”’. 

Because we use GDP data through 2010 and attempt to quantify economic 
impacts from that year forward, we must also project global and country-level tem- 
perature changes forward from the year 2010. To do so, and to control for individ- 
ual climate model biases in average temperatures, we first calculate the difference 
between model-projected annual average future temperatures (in 2046-2065 or 
2081-2100) and model-simulated annual average temperatures in the baseline 
1986-2005 period. We then add those model-projected differences to the actual 
historical temperature observations. 

For each climate model m corresponding to a chosen RCP scenario s at a given 
time period, we first calculate two quantities: (1). The magnitude of global temper- 
ature change AT”, which is the difference in annual average global surface temper- 
ature between a 1986-2005 baseline period and a future period (either 2046-2065 
or 2081-2100). Gridded temperature projections relative to this baseline period 
are produced at 2.5° resolution. These are aggregated to a scalar ‘global warming’ 
projection by taking an average over all grid cells, with each cell g weighted by the 
cosine of the latitude of each cell g’s centrepoint L (given the convergence of lines 
of latitude towards the poles): 


ys {cos(L,) x ly ead 
Dy 608( Ly) 


sm 
‘g.base)} 


AT™= (3) 


(2). The magnitude of each country i’s temperature change AT;”", analogously 
computed by taking the average projected temperature change of all cells g but 
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weighted by their share of country i's population Pj, rather than by their relative 
surface area. Gridded population distribution data”* is provided at 30-arc-sec 
resolution and is aggregated to 2.5° resolution to match the temperature projection 
data. Thus, country-level temperature change projections are described by the 
equation: 


{P, x ( ooh im ebase)} 
AT = Ug ig — gb (4) 
Lg ig 


To express the future global-scale temperature values relative to pre-industrial 
values, as in the UN temperature targets, we add these model-projected dif- 
ferences between the future and the baseline to the global-scale warming that 
occurred between the pre-industrial period and the end of the period of GDP and 
temperature observations (which extends to 2010). According to the IPCC, the 
“globally averaged combined land and ocean surface temperature data as calcu- 
lated by a linear trend, show a warming of 0.85 [0.65 to 1.06] °C, over the period 
1880-2012,’ and the “total increase between the average of the 1850-1900 period 
and the 2003-2012 period is 0.78 [0.72 to 0.85] °C””’. We therefore assume that 
0.8°C of warming took place between the pre-industrial period and the end of our 
observational period. Thus for the global averages AT”, “global warming relative 
to pre-industrial” is equal to AT” + 0.8 for all s and m. 

To generate annual country-specific time series of projected future changes in 
temperature for input into the simulations below, we assume that temperatures 
increase linearly between the base period and the end period, and then add the 
linearized projected change in temperature to the observed average baseline tem- 
perature, thus ‘bias-correcting’ future national temperature time series. Thus for 


a given climate model-RCP realization, if the observed average historical temper- 


ature during the base period is T;) = —_ then the projected temperature in 
each future year is: 
AT**— Tg. f—thase ATs™ 
it = Tit eal (5) 
lend hase 


where fpase = 2010 is the initial year of our simulation and fenq is either 2049 or 
2099. (As before, small t indexes time and capital T refers to temperature). The 
assumed linear temperature increase appears to be consistent with RCP 4.5 or 6.0 
through mid-century; it is perhaps less consistent with RCP2.6 through the end of 
the century, as RCP2.6 warms though mid-century and then stabilizes through to 
the end of the century. To understand whether our assumed linear warming path 
distorts our findings for RCP2.6, we conduct an additional experiment in which we 
assume all warming under RCP2.6 occurs by 2049, and then temperatures stabilize 
at this new level between 2050 and 2099 (Extended Data Fig. 5). This scenario 
has the same projected global warming by the end of the century as our baseline 
RCP2.6 scenario, but all warming is assumed to happen in the first half of the 21st 
century. As shown in Extended Data Fig. 5, we find that the scenario with rapid 
initial warming worsens the overall impacts of climate change and increases the 
cumulative benefits of limiting warming to 1.5°C versus 2°C. 

Defining counterfactual growth scenarios. To project growth in GDP absent cli- 
mate change, we use projections from the SSPs, a framework developed to describe 
conditions associated with various degrees of climate forcing by the end of the cen- 
tury. In all, there are five SSP narratives, each making different assumptions about 
mitigation and adaptation challenges, demographic trends, and developments 
in the energy industry!” We exploit the time series of projected country-level 
economic growth and population from 2010 to 2095 associated with the SSP1 
narrative, because this appears to be the SSP most consistent with the forcing levels 
required to achieve 1.5°C warming in 2049 or 2099° (although, as pointed out by 
ref. °, with high enough carbon pricing all SSPs could potentially be consistent with 
1.5°C warming by mid-century, and three SSPs could be consistent with 1.5°C 
warming by the end of the century). SSP1 is described as an optimistic future 
with ‘low’ challenges to adaptation and mitigation. SSP1 is characterized by many 
developing countries contributing an increasingly large share of global GDP by the 
end of the century (Extended Data Fig. 1a and b), with a larger share of total global 
GDP projected to be produced in countries with warmer average temperatures by 
the end of the century absent climate change (Extended Data Fig. 1c). In addition 
to using SSP1, we also test the robustness of our results to alternative choices from 
the other four SSPs (Fig. 4 and Extended Data Fig. 6). 

Projecting economic impacts of 1.5°C versus 2°C. Step (1). Assemble input data. 
Required input data are the parameters of each response function //(Tjt) estimated 
from each of the j bootstraps of equation (1); projections of country-year average 
temperature T;,” for each GCM m for a given RCP scenario s through to 2049 or 
2099; projections of baseline country-year per capita growth rates \ ;" and popula- 
tions w; through 2099, for each country i and year t, from a given SSP scenario k. 

Step (2). Calculate country-specific growth trajectories for each bootstrap-RCP- 
GCM-SSP combination. Projections are initialized using average temperature 
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and GDP per capita between 2000-2010 as the baseline for each of the countries 
in our analysis. For a given historical bootstrap run j and GCM-RCP-SSP 
projection sm, GDP per capita y in each future year t+ 1 in country iis projected 
by the equation: 


jsmk jsmk 


Vitra = Vit x (1+ Asi + Gey (6) 


where Aj; ; is the level of economic growth projected by the data corresponding 
to the particular SSP series and p= h!(Tj:"; :) —h’ (Tjp)is the additional estimated 
change in the growth rate due to the projected temperature increase above baseline 
for bootstrap run j and GCM projection ms. We also run a counterfactual 
no-warming scenario where temperatures are fixed at baseline levels, that is, 
Tj and ;=0 for all i and t): 


Titti1= 


Sy =y/"x + Nirev (7) 
With 165 countries, 1,000 bootstrap estimates of the temperature response func- 
tion h(-), 100 total temperature time series (corresponding to 42, 25 and 32 cli- 
mate models for mid-century RCP4.5, mid-century RCP6.0, and end-of-century 
RCP2.6, respectively, plus the constant-temperature series), five SSPs, and five 
bootstrap resampling schemes, we analysed more than 400 million distinct coun- 
try-level economic pathways. 
Step (3). Calculate global GDP trajectories for each bootstrap-RCP-GCM-SSP com- 
bination. For each GCM-bootstrap-SSP combination in a given period t, global 
GDP per capita is calculated as the average GDP per capita across countries, 
weighted by share of world population: 


eae W; see 
jsmk __ it jsmk 
y= or Si (8) 


i We 
where “# wit is country 7's projected share of global population in year t for a given 


SSP. we Giimilatly produce a time series of total global GDP by replacing it with 


w;;, the country 7’s projected population in that year. This is also calculated for the 
no- ‘warming scenario, yielding counterfactual global GDP time series a and 

Y?’*, where Y; denotes GDP. 
Step 4. Calculate projected percentage changes in GDP or global GDP relative to the 
no-warming counterfactual for each bootstrap-RCP-GCM-SSP combination. For 
each bootstrap-RCP-GCM-SSP combination, we calculate the warming-induced 
percentage change in GDP relative to the counterfactual no-warming scenario in 
each country as: 


wasn Vit : | (9) 


This is calculated for t= 2049 for RCP4.5 and RCP6.0, and t= 2099 for RCP2.6. 
The percentage impact on global GDP per capita, ¥*”", is calculated similarly for 
these endpoint years. 

Step (5). Calculate projected discounted absolute changes in GDP or global GDP 
relative to the no-warming counterfactual for each bootstrap-RCP-GCM-SSP 
combination. The cumulative absolute dollar impact of warming is calculated for 
each country by taking the annual difference between the unique bootstrap-RCP- 
GCM-SSP projected GDP time series and the counterfactual no-warming time 
series, and discounting these differences back to present: 


-jsmk 
oejsmn = > Yir vi (10) 
7 (l+n) ° 
where Y/*"* = =y) ™ x wi; and r; is the social discount rate that could vary with t. 
The global absolute impact is calculated by summing country-level impacts: 
Qismn — ys oem, 

ivi 

Given the long-running and unresolved debate over how r should be specified, 
we calculate @/*”" under a range of approaches to specifying r. Specifically, we 
implement a variety of approaches discussed and implemented by previous authors, 
including implementations of the Ramsey equation with and without uncertainty 
and under alternate parameter choices for time preference and the marginal utility 
of consumption**-*4, calibrations to historical market interest rates in the USA***°, 
and constant discount rates*” ranging from 2.5%-5%. Choices about the discount 
rate clearly have large implications for the estimation of damages. For instance, 
US$1,000 of damages in 50 years is worth US$228 today under a 3% annual dis- 
count rate, but only US$87 under a 5% annual rate. 

As described by multiple authors*?74"8, choices about r can be approached from 
the perspective of a social planner wishing to maximize the welfare of society. The 
central intuitions in this approach are that extra income or consumption is worth 
more to poor people than it is to rich people, and that with rising incomes a dollar 
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of additional income is worth less in the future than it is today. Under standard 
assumptions about the functional form of the ‘utility function’ that relates changes 
in consumption to changes in utility, this approach yields the Ramsey formula, 
which specifies the annual discount rate on consumption as: 


(11) 


where p is the pure or social rate of time preference (the rate at which society 
discounts the utility of future generations), 77 is the elasticity of marginal utility 
of consumption (or how fast the utility of consumption declines as consumption 
increases), and gis the growth rate in consumption. If there is uncertainty about the 
growth rate in consumption, a third term is added to the Ramsey equation which 
induces a precautionary savings effect™: 


r=pt+ng 


r= p+ng—0.5n°o, (12) 
where oy is the variance in the growth rate. Uncertainty in future consumption 
growth enters negatively as the social planner, facing the possibility of slow future 
growth, wishes to transfer more resources to the future. 

Using equations (11) and (12) and parameter choices about p and 7) from three 
benchmark studies*”-*? (Stern p=0.1,7 = 1; Nordhaus p= 1, 7=2; and Weitzman 
p=2,n=2; see Extended Data Fig. 1), we implement six versions of the Ramsey 
approach—three without uncertainty in future growth and three with uncertainty. 
For each bootstrap-RCP-GCM-SSP run, we define the growth rate g; as the pop- 
ulation-weighted average growth rate of GDP per capita: 


K 
ee => £04 Ayvt ee") 


it 


(13) 


with parameters defined as in equations (6) and (8) above. Average values across 
GCMs are shown in Extended Data Fig. 1a. Uncertainty in the growth rate for 
each future year is calculated as Oe = var(g/*"*), that is, the variance in projected 
growth rates in a given year across all bootstrap~-RCP-GCM-SSP estimates. This 
probably represents a substantial lower bound on the true uncertainty in the 
growth rate, as it accounts only for uncertainty induced by additional warming 
and not for uncertainty in the underlying secular rate of growth (for which the 
SSPs do not provide uncertainty estimates). 

Parameter choices and estimates of future growth rates are then used in either 
equation (11) or (12) to calculate year-specific discount rates r;. The resulting 
estimates of Ramsey-based discount rates are shown in Extended Data Fig. 1b. All 
versions estimate higher interest rates in earlier periods, which is primarily a result 
of higher estimated baseline (SSP) growth rates in the earlier half of the century. 
Discount rates by end of century using the Ramsey approach range from 1.2% 
(Stern) to 4.2% (Weitzman), with the inclusion of the uncertainty term lowering 
discount rates only slightly. 

Given that future baseline growth rates in developing and developed countries 
could be different, and given that the marginal effect of warming will probably 
differ between developing and developed countries given their different baseline 
temperatures, we also run scenarios where discount rates are allowed to differ 
between rich and poor countries (defined as being below or above the median 
level of GDP per capita at baseline). Specifically, using SSP1 data we produce sep- 
arate population-weighted growth series for poor and rich countries (as shown 
in Extended Data Fig. 1c), and plug these growth projections into the Ramsey 
equation for each of the three benchmark choices of p and 1) to produce the six time 
series of discount rates that appear in Extended Data Fig. 1d. These income-spe- 
cific discount rates, which are higher for poor countries than for rich countries 
given differences in baseline growth rates, are then applied to the relevant country 
groupings in the calculations below. As shown in Extended Data Fig. 3, allowing 
for income-specific discount rates results in higher median estimates of the global 
benefit of restricting warming to 1.5°C. This is because global benefits are driven 
largely by impacts in the largest economies, including the USA and China, and 
allowing for income-specific discount rates lowers the rates for rich countries rel- 
ative to the pooled scheme (for example, compare Extended Data Fig. 1b against 
Extended Data Fig. 1d), which translates to larger cumulative benefits in large 
economies projected to be harmed by warming (which again includes both the 
USA and China). 

Beyond the Ramsey framework, another approach to specifying the dis- 
count rate uses the observed evolution of market interest rates over long peri- 
ods combined with models of interest rate behaviour to project interest rates. 
We extract estimates from two of these exercises*>*°, both of which assume an 
initial interest rate of 4% and then project interest rates to fall by almost half 
by end of century (Groom and Newell-Pizer; Extended Data Fig. 1b). Unlike 
for the Ramsey discount rates, we assume these market discount rates are the 
same across bootstrap-RCP-GCM-SSP combinations, and just vary over time 
as shown in the plot. 


For each bootstrap~-RCP-GCM-SSP combination, each of these fourteen dis- 
count rates (six Ramsey with global average income, three Ramsey with rich/poor 
differences, two market-based, and fixed rates of 2.5%, 3% and 5%) are calculated 
for each and used in equation (10) to calculate the present value (in 2010) of the 
damages from warming. 

Step (6). Calculate percentage or absolute damages at 1.5 °C versus 2°C. To calculate 
relative damages at 1.5°C versus 2°C for a given bootstrap~RCP-SSP combination, 
we take estimates of percentage impacts W/*”"* or discounted absolute impacts 
08" across GCMs and fit a linear least-squares regression that relates estimated 
damages to the amount of global warming projected by the climate model by the 
end of the projection period (AT™). So for absolute damages in a given country, 
this regression is: 

Of" = BFFAT™ +e; (14) 
This relation is shown to be well approximated at the global level by a linear model 
(Fig. le-g). The slope of the linear fit Bee is that bootstrap’s estimate of the per-de- 
gree-Celsius impact of global temperature change on GDP per capita in country i. 
Halving this value thus gives us the impact of a half-degree change in global tem- 
perature for a given bootstrap, which, given linearity, is the estimated impact of 
limiting global warming to 1.5°C relative to 2.0°C in that country. Equation (14) 
is then re-estimated for each country and for each bootstrap, generating 1,000 
estimates of impacts in each country for each RCP and SSP combination. We also 
estimate equation (14) at the global level to generate comparable results on per- 
centage and absolute damages to global GDP. Global results are shown in Figs. 1 
and 2, and country-level results are shown in Fig. 3a and b. 
Step (7). Calculate probability of economic benefits of limiting warming to 1.5°C 
versus 2 °C. Finally, we calculate the probability of economic gain under the 1.5°C 
versus 2°C scenarios—that is, the probability that damages from 1.5°C of global 
warming will be smaller than damages from 2°C of global warming—as the frac- 
tion of estimates of /2/*" across 1,000 bootstrap runs that are negative. This is calcu- 
lated for the world as a whole, as well as separately for each country (Fig. 3c and d). 
Quantifying impacts of global warming beyond 2°C. Recent estimates suggest 
that countries’ current mitigation commitments (NDCs) are unlikely to limit global 
warming to 2°C and are instead more likely to be consistent with warming in a 
2.5-3°C range’. To evaluate the impact of warming under these alternative warm- 
ing outcomes, as well as for warming that exceeds 3 °C, we recalculate estimates of 
wP"* and G8" across all RCPs s and for all SSPs «. This provides estimates of the 
global impact of various warming scenarios relative to a no-warming counterfactual. 

As shown in Fig. 4 and Extended Data Fig. 6, impacts are larger at higher levels 

of warming, with estimates suggesting that if current NDCs are achieved, global 
GDP could be 15%-25% lower by the end of the century as compared to a world 
that did not warm. Impacts for warming beyond 3°C are even larger, but decline 
less steeply at the highest levels of warming (consistent with ref. *). This is because 
for hot countries that are substantially harmed by high levels of warming, GDP 
levels are bounded below by zero, whereas for cold countries that are substantially 
benefited by future warming, GDP levels can grow unbounded. 
Quantifying sources of uncertainty in overall impacts of global warming. Our 
impact estimates (for example, on discounted global world product O#”"* from 
equation (10) above) are derived by combining historical regression results, future 
climate change projections from climate models, assumptions on baseline future 
growth rates from SSPs, and discount rates. Each of these has associated uncer- 
tainty, which we propagate throughout the analysis. In particular, total uncertainty 
in the impact of warming on global GDP under a given forcing scenario is a com- 
bination of uncertainty in how economies respond to warming (what we term ‘his- 
torical regression uncertainty’), uncertainty across climate models in the amount 
and pattern of warming for a given level of forcing (‘climate model uncertainty’), 
uncertainty in baseline future growth rates across SSPs (‘SSP uncertainty’), and 
plausible alternatives for how to specify the discount rate (‘discount rate uncer- 
tainty’). To quantify the relative contribution of each to overall impact uncertainty 
under a given level of forcing (RCP), we hold three out of four variables fixed 
and allow the fourth to vary. Variables are fixed as follows: historical regression 
uncertainty is fixed at the regression point estimate, discount rates are fixed at 3%, 
the SSP is fixed at the SSP providing the median impact estimate (typically SSP3), 
and the climate model projection is fixed at the model giving the median global 
warming projection for either mid-century or the end of the century. 

Results for discounted cumulative global GDP loss due to warming are shown 
in Fig. 4b-d. For both 2049 (RCP4.5) and 2099 (RCP2.6), historical regression 
uncertainty—that is, uncertainty in how economies have responded to warm- 
ing in the recent past—is the dominant source of uncertainty in overall impact 
projections for a given forcing level, followed by uncertainty due to alternative 
possible specifications of the discount rate. For instance, holding all other sources 
of uncertainty fixed for the end of the century, historical regression uncertainty 
alone leads to a 95% confidence interval of impact estimates of —US$122 trillion to 
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US$32 trillion, discount rate uncertainty to a 95% confidence interval of -US$375 
trillion to —US$25 trillion, and climate model uncertainty to a 95% confidence 
interval of —US$78 trillion to US$4 trillion. Thus the overall uncertainty in impacts 
induced by uncertainty in economic parameters is around 2-4 times higher than 
that resulting from climate model uncertainty. 

There are multiple caveats to this analysis, including that historical uncertainty 
would be larger if regression models with additional lags were also included, and 
that discount rate uncertainty could be understated if our 14 alternative discount- 
ing approaches do not span the range of ‘plausible’ discount rates. 

While further constraining the range of plausible discount rates is perhaps 
challenging, not least owing to ethical considerations central to the choice of 
social-welfare-based discount rates**, reducing uncertainty around how econo- 
mies will respond to warming could be more tractable. Promising avenues could 
include detailed empirically based bottom-up assessments of climate impacts at 
the country level’, leveraging existing sub-national or firm (company)-level data 
to estimate impacts'>””, or using new fine-scale remote-sensing-based estimates of 
economic output to greatly increase the temporal and spatial specificity of outcome 
measurements*?"”. 

Data availability. All data and code that support the findings of this study are 
available at https://purl.stanford.edu/vn535jm8926. 
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Extended Data Fig. 1 | Discount rate scenarios used in calculation of 
cumulative discounted impacts of future warming. a, Projected global 
average annual growth rates under SSP1 with and without climate change; 
estimates are averaged across bootstraps and climate models. Projected 
growth rates with climate change are used to define future consumption 
growth in Ramsey-based discount rates. b, Evolution of discount rates 
under different schemes through 2099. Ramsey-based schemes are 
Stern*°, Weitzman?! and Nordhaus’, with corresponding assumptions 
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about the pure rate of time discount p and the elasticity of marginal utility 
of consumption 7) shown in parentheses. Dashed lines are versions of 
these Ramsey-based discounting schemes that account for growth-rate 
uncertainty. Non-Ramsey schemes are Newell and Pizer* and Groom*®. 

c, Projected average annual growth rates separately for rich and poor 
countries under SSP1, with and without climate change. d, Corresponding 
Ramsey-based discount rates calculated separately for rich and poor 
countries, using income-specific growth rates from c. 
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Extended Data Fig. 2 | Global GDP impacts can be negative at +1°C 
but positive at +2 °C for some high-temperature-optimum bootstrap 
runs. a, b, Country share of global GDP at baseline (a) and by the end of 
the century (b) under SSP1, assuming no climate change. c, Distribution 
of global GDP by temperature, under baseline (black) and the end of 

the century SSP1 without climate change (red dashed); absent climate 
change, a substantial portion of global GDP is projected to be produced in 
countries with hotter average temperatures. d, Climate-model-predicted 
average global warming under RCP2.6 by the end of the century (x axis) 
versus the correlation between country-level baseline average temperature 
and country-level predicted warming in each model. In models that warm 
less at the global scale, countries that are currently warm tend to exhibit 
relatively larger warming, while in models that warm more at the global 
scale, countries that are currently cool tend to exhibit relatively larger 
warming. Future impacts on global GDP are a sum of country-specific 
impacts, which are a function of where each country is on the temperature 
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response function (Fig. 1a) and the projected amount of future warming 
in that country; a given percentage impact in a country with a large GDP 
has a larger effect on global GDP than the same percentage impact in 
a country with small GDP. For high-temperature-optimum response 
functions (for example, Fig. 1g), impacts can be negative at +1°C but 
positive at +2 °C because (i) absent climate change, a much larger 
proportion of total global GDP is projected by SSP1 to be produced in 
countries that are currently warmer than the optimum, and (ii) climate 
models with lower overall global warming projections under RCP2.6 tend 
to have higher relative warming in countries that are currently warm. This 
generates negative impacts at about 1 °C, where impacts are dominated by 
negative effects in warm countries (largely in the developing world), but 
positive impacts at about 2°C, where high-latitude countries instead warm 
disproportionately and experience benefits that outweigh the damages in 
tropical countries. 
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Extended Data Fig. 3 | Change in cumulative global GDP under 1.5°C estimate of benefits under each discounting scheme*?*”?>6, Red lines 
versus 2°C global warming by the end of the century under different indicate median across bootstraps for each discounting scheme. Uniform 
discounting schemes. Positive values indicate benefits (reduced losses) schemes correspond to those in Extended Data Table 1; other schemes are 
at 1.5°C versus 2 °C. Each vertical line corresponds to a bootstrap described in Methods. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


—100 0 100 
Change in cumulative global GDP (trillions USD) 


Extended Data Fig. 4 | Robustness of results to alternative 
specifications. Change in global GDP per capita in 2049 and 2099 based 
on regression models that include 0, 1 or 5 lags (a and b); bootstrap 


Bootstrap scheme 
— country 
+++ year-block 


-10 5 0 5 10 
Change in global GDP/capita (%) 


~ 
es 2099 


—50 0 50 100 150 
Change in cumulative global GDP (trillions USD) 


schemes that sample by country, five-year block or single year (c and d); or 
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alternative SSPs (e and f). Top panels show percentage changes in global 
GDP per capita under 1.5°C versus 2 °C; the bottom panels show change in 
cumulative global GDP in USS trillions under a 3% discount rate. 
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Extended Data Fig. 5 | Robustness under alternative warming paths. 
Benefit—in terms of per capita GDP (a) and cumulative GDP (b)—of 
1.5°C versus 2 °C by end of century under the baseline assumption that 
overall projected warming occurs linearly between the baseline year and 
2099 (pink), versus projected benefit assuming that all projected warming 
occurs by 2049 and temperatures remain constant thereafter (blue). Both 


Linear warming to end-of-century 
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scenarios have the same projected global warming by the end of the 
century. For the same level of overall warming by the end of the century, 
scenarios with rapid initial warming worsen the overall impacts of climate 
change and increase the cumulative benefits of limiting warming to 1.5°C 
versus 2 °C, 
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Extended Data Fig. 6 | Projected change in global GDP (%) under global 
warming by the end of the century, for each SSP. Panels a—e show the 
change in GDP for different climate models under different RCP forcing 
scenarios, relative to a no-warming baseline (median bootstrap) for SSPs 
1-5, respectively. Results are as in Fig. 4a, but for each SSP. Each dot 
represents an RCP-climate model projected change in global GDP under 
a given SSP; colours represent the four RCPs. Lines are least-squares fits 
to the points corresponding to the different RCPs with matching colour 
scheme. The three vertical black lines denote the 1.5 °C target, the 2°C 
target and the median-estimated warming expected under current Paris 
commitments (2.9°C)’. Warming is relative to pre-industrial levels. 
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Extended Data Table 1 | Change in cumulative global GDP (in US$ trillions) under 1.5 °C versus 2°C global warming by the end of the 
century under different discounting schemes 


discount scheme 1% 5% 10% 25% 50% 75% 90% 95% 99% 
Uniform 5% -31 -18 = -11 0 11 21 29 33 40 
Weitzman -29 -17———-12 4 28 65 112 144 217 
Weitzman +uncertainty -29 -17 -12 4 28 65 112 144) 217 
Uniform 3% -95  -54— -32 3 36 65 88 101 119 
Nordhaus -39 -23 -16 > 39 91 160 206 312 
Nordhaus +uncertainty -39 -23 -16 5 39 91 160 206 313 
Weitzman rich/poor -22 -14 -6 11 46 96 176 244 442 
Newell & Pizer -130 -72~— -43 5 49 86 115 132 156 
Uniform 2.5% -129. -73_— -43 5 50 87 118 136 161 
Groom et al -145 -81 — -48 5 56 98 132 151 179 
Nordhaus rich/poor -28 -17 -7 17 67 140 257 360 657 
Stern -240 -136 -88 24 166 336 515 618 840 
Stern +uncertainty -240 -136 -88 24 166 336 515 619 841 
Stern rich/poor -171 -99_ -38 71 231 414 623 740 1091 


Values show estimated impacts at different quantiles of the estimated impact distribution for each discounting scheme (uniform schemes?’, Weitzman*!, Nordhaus®2, Newell and Pizer?5, Groom?® and 
Stern®°), and correspond to estimates shown in Extended Data Fig. 3. Positive values indicate benefits (reduced losses) at 1.5°C versus 2°C. 
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Extended Data Table 2 | Probability that limiting global warming to 1.5°C will generate benefits relative to 2°C warming 


% Aglobal GDP/capita by mid-century Cumulative Aglobal GDP by mid-century 

RCP4.5 RCP6.0 
benefit threshold = RCP4.5 RCP6.0 benefit threshold = 2.5% 3.0% 5.0% 2.5% 3.0% 5.0% 
0% 0.80 0.73 0 0.76 0.76 0.75 0.69 0.68 0.68 
1.25% 0.63 0.53 $10 trillion 0.56 0.53 0.32 0.47 0.43 0.23 
2.50% 0.42 0.31 $20 trillion 0.33 0.27 0.03 0.24 0.18 0.01 
3.75% 0.21 0.13 $30 trillion 0.14 0.08 0.00 0.08 0.04 0.00 
5.00% 0.07 0.03 $40 trillion 0.04 0.01 0.00 0.01 0.00 0.00 

% Aglobal GDP/capita by end-of-century Cumulative Aglobal GDP by end-of-century 

RCP2.6 
benefit threshold RCP2.6 benefit threshold = 2.5% 3.0% 5.0% 
0% 0.76 0 0.78 0.78 0.76 
2% 0.62 $10 trillion 0.72 0.71 0.53 
4% 0.45 $20 trillion 0.68 0.63 0.28 
6% 0.27 $50 trillion 0.50 0.38 0.00 
8% 0.12 $100 trillion 0.18 0.05 0.00 
10% 0.03 $150 trillion 0.02 0.00 0.00 


Left panels show benefits in terms of percentage change in global GDP per capita by mid-century and the end of the century. For instance, by mid-century under RCP4.5 there is a 42% probability of 
benefits exceeding 2.5% of global GDP per cap. Right panels show benefits in terms of cumulative change in global GDP by mid-century and the end of the century, under three different discount rates 
for each relevant RCP. For instance, by the end of the century, there is a 50% probability of benefits exceeding US$50 trillion using a discount rate of 2.5%. 
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Extended Data Table 3 | Probability that limiting global warming to 1.5°C will generate different levels of benefits relative to 2.0°C 
warming, under different discounting schemes 


discount scheme O $10trillion $20trillion $40trillion $100trillion $200trillion $350 trillion 
Uniform 5% 0.76 0.53 0.28 0.01 0.00 0.00 0.00 
Weitzman 0.79 0.67 0.57 0.40 0.13 0.02 0.00 
Weitzman +uncertainty 0.79 0.67 0.57 0.40 0.13 0.02 0.00 
Uniform 3% 0.78 0.71 0.63 0.47 0.05 0.00 0.00 
Nordhaus 0.80 0.70 0.63 0.50 0.22 0.06 0.01 
Nordhaus +uncertainty 0.80 0.70 0.63 0.50 0.22 0.06 0.01 
Weitzman rich/poor 0.86 0.76 0.68 0.54 0.24 0.08 0.02 
Newell & Pizer 0.78 0.72 0.68 0.55 0.17 0.00 0.00 
Uniform 2.5% 0.78 0.72 0.68 0.56 0.18 0.00 0.00 
Groom et al 0.78 0.73 0.69 0.59 0.24 0.00 0.00 
Nordhaus rich/poor 0.87 0.80 0.73 0.63 0.38 0.17 0.05 
Stern 0.80 0.78 0.76 0.72 0.62 0.44 0.23 
Stern +uncertainty 0.80 0.78 0.76 0.72 0.62 0.44 0.23 
Stern rich/poor 0.86 0.85 0.83 0.80 0.70 0.54 0.32 


Benefits are in terms of cumulative change in global GDP by the end of the century (RCP2.6). Discounting schemes are: uniform schemes?’, Weitzman*!, Nordhaus®2, Newell and Pizer?5, Groom?® and 
Stern®°). 
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Inference of ecological and social drivers of human 


brain-size evolution 


Mauricio Gonzdlez-Forero! & Andy Gardner!* 


The human brain is unusually large. It has tripled in size from 
Australopithecines to modern humans! and has become almost 
six times larger than expected for a placental mammal of human 
size”. Brains incur high metabolic costs* and accordingly a long- 
standing question is why the large human brain has evolved*. The 
leading hypotheses propose benefits of improved cognition for 
overcoming ecological®’, social®"!° or cultural’ challenges. 
However, these hypotheses are typically assessed using correlative 
analyses, and establishing causes for brain-size evolution remains 
difficult!>!®, Here we introduce a metabolic approach that enables 
causal assessment of social hypotheses for brain-size evolution. Our 
approach yields quantitative predictions for brain and body size 
from formalized social hypotheses given empirical estimates of the 
metabolic costs of the brain. Our model predicts the evolution of 
adult Homo sapiens-sized brains and bodies when individuals face a 
combination of 60% ecological, 30% cooperative and 10% between- 
group competitive challenges, and suggests that between-individual 
competition has been unimportant for driving human brain-size 
evolution. Moreover, our model indicates that brain expansion in 
Homo was driven by ecological rather than social challenges, and was 
perhaps strongly promoted by culture. Our metabolic approach thus 
enables causal assessments that refine, refute and unify hypotheses 
of brain-size evolution. 

The leading hypotheses for the evolution of brain size make differ- 
ent suggestions as to which cognitive challenges have been the most 
important in driving brain expansion. ‘Ecological-intelligence’ hypoth- 
eses emphasize challenges posed by the non-social environment, for 
example, finding, caching or processing food*”’ (Fig. 1). By contrast, 
‘social-intelligence’ hypotheses emphasize challenges posed by the social 
environment, for example, cooperating for resource extraction!®!>, 
manipulating others, avoiding manipulation or forming coalitions 
and alliances to outcompete others®? (Fig. 1). Social challenges have 


Challenges 


been suggested to constitute particularly powerful drivers of brain 
expansion, because they may have triggered evolutionary arms races 
in cognition®”. Finally, ‘cultural-intelligence’ hypotheses emphasize 
challenges of learning from others, teaching and doing so when there is 
accumulated cultural knowledge'!"'*, Empirical tests of these hypoth- 
eses customarily investigate phylogenetically controlled correlations 
between brain size (or the size of brain components) and candidate 
selective factors (for example, diet type>!”, tactical-deception rate!®, 
group size’! and social-learning proclivity”). However, establishing 
causality has proven to be difficult. For example, given a positive cor- 
relation, it is unclear whether large group sizes favour bigger brains or 
big brains enable larger group sizes'©. Moreover, there is the quantitative 
problem of explaining not only why bigger brains are favoured, but also 
why they are favoured to the particularly large size observed in humans 
(around 1.3kg for a body size of approximately 50kg in females”!”’). 
To address these problems, we merge elements of metabolic theory”’, 
life-history theory and differential games to obtain quantitative predic- 
tions for the evolution of brain and body size when individuals face 
ecological and social challenges given metabolic costs of the brain. Our 
approach incorporates social interactions into a previous non-social 
model*4 (Supplementary Information 1-3). As a first approximation, 
we consider a female population and partition the body mass of each 
individual into three tissues: ‘brain, ‘reproductive and other ‘somatic 
tissue (Fig. 2a). Part of the energy consumption of reproductive tissue 
is for the production and maintenance of offspring, whereas part of 
energy consumption of the brain is for production (learning) and main- 
tenance (memory) of energy-extraction skills. Accordingly, at each age 
the individual has a certain skill level measured in information units 
(that is, bytes). She extracts energy by using her skill level to overcome 
ecological or social energy-extraction challenges. Success in an ecolog- 
ical challenge depends on her own skill level, whereas success in a social 
challenge depends on her skill level and that of her social partners. We 
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Social 


Cooperative 


‘Me versus nature’ 


Fig. 1 | Ecological and social hypotheses for brain expansion. Ecological 
hypotheses emphasize challenges ‘against nature, whereas social 
hypotheses emphasize challenges involving social partners. Here we 
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partition these hypotheses into four types of challenges that are expected 
to trigger different evolutionary processes. 
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Fig. 2 | Model description. a, Schematic description of the model (see text 
for details). Numbers in parentheses indicate the corresponding equations 
in the Methods. b, c, The model depends on the shape of EEE with respect 
to skill (b) and three sets of parameters P, Q and R (c). 


consider three types of social challenge: ‘cooperative’ in which the indi- 
vidual’s skill level and that of a social partner of the same age (hereafter 
‘peer’) interact to overcome a challenge; ‘between-individual compet- 
itive, in which the individual uses her skill level against that of a peer 
to extract energy; and ‘between-group competitive} in which the indi- 
vidual’s skill level and that of a peer act together and against the skills 
of another two peers (that is, one coalition competing against another). 
We assume that during any small time-interval, the individual faces 
energy-extraction challenges, a proportion (P;) of which are of type j 
(collectively denoted by P, with j= 1,..., 4 indexing the four challenge 
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types and ae ,P;=1). Given P, the growth strategy controls the 
amount of energy allocated to the production of each tissue throughout 
life, and we let the growth strategy evolve. The individual’s energy- 
extraction efficiency (EEE) thus depends on her skill level, the skill 
levels of her social partners and the challenges faced, and we consider 
two shapes for EEE (Fig. 2b and Extended Data Fig. 1). We use previ- 
ously published data, primarily from Kuzawa et al.”!, for parameter 
estimates, including metabolic costs of the brain (Fig. 2c, Methods and 
Extended Data Figs. 2, 3). 

We find that increasing the proportion of cooperative challenges 
decreases both adult absolute brain size (hereafter ‘brain size) and 
adult relative brain size (hereafter ‘encephalization quotient, which 
is the adult brain size divided by expected brain size for a given body 
size’; Fig. 3a—-c and Extended Data Fig. 4). By contrast, increas- 
ing the proportion of between-individual competitive challenges 
increases brain size when EEE is weakly decelerating with skill 
(Fig. 3a), but decreases brain size when EEE is strongly decelerating 
(Fig. 3b and Extended Data Fig. 4). However, although between- 
individual competition increases brain size with weakly decelerating 
EEE, the result is larger brains and smaller bodies than those of 
modern humans (Fig. 3a and Extended Data Fig. 4). Between- 
individual competition also decreases body mass as it increases the 
difficulty of energy extraction and thus limits the energy available 
for body growth; consequently, between-individual competition 
increases the encephalization quotient, either because brain size 
increases and body size decreases or because brain size decreases 
and body size decreases more strongly than brain size. Increasing 
the proportion of between-group competition generally decreases 
brain size, but increases the encephalization quotient, because body 
size decreases more strongly than brain size (Fig. 3a, b). However, 
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Fig. 3 | Effects of challenge types on brain size and best-fitting scenario 
for adult H. sapiens. a, b, Effects of increasing the proportion of a 
challenge type while decreasing the proportion of ecological challenges. 
EQ, encephalization quotient. a, Weakly decelerating EEE (exponential 
competence). b, Strongly decelerating EEE (power competence). Dot 
colour indicates the adult fit with brain and body mass of H. sapiens 

(that is, —D(7,); Supplementary Information 6). Zero adult fit means 


a perfect fit. c, Qualitative effects of challenge types on brain mass and 
encephalization quotient. d, Best fitting scenario for H. sapiens. Dots 
indicate the adult fit for every challenge combination that was solvable. 
Shaded regions indicate the simplex in which P; can vary. The best fit 
occurs in P* = (0.6, 0.3, 0, 0.1) (adult fit: —0.03). e, “High fit’ intervals 
around P* where adult fit is greater than —0.05 and dots are interpolated 
points with the best fit. 
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Fig. 4 | Best-fitting scenarios across adult Homo, and resulting 
predicted life history for H. sapiens. a, Best adult fitting scenarios across 
Homo (figure modified with permission from figure 8.1 of ref. !). Pie 
charts and plots respectively show the challenge combination and shape 

of EEE versus skill that yielded the best adult fit using the same Q and R 
parameters (Extended Data Figs. 6, 7). b, Life history with the challenge 
combination yielding the best adult fit for H. sapiens. Resulting life periods 
are indicated above the body mass plot. Vertical lines are ages at which 

the growth strategy changes suddenly; within childhood, they occur when 
brain growth begins and terminates. 


with weakly decelerating EEE and additive cooperation, between- 
group competition increases both brain size and the encephalization 
quotient (Extended Data Fig. 4). Moderately frequent between- 
individual or between-group competition can lead to no allocation 
to brain and body growth (blue dots in Fig. 3a and Extended Data 
Fig. 4; see also Extended Data Fig. 5a, d); additionally, moderately 
frequent between-group competition in the presence of substantial 
cooperation can lead to arms races in brain size, which fail to yield 
stable, large brains (for example, because of cycling in brain size 
or eventual collapse to no allocation into brain growth (Extended 
Data Fig. 5)). This is because energy extraction becomes exceed- 
ingly difficult in the presence of large-brained competitors such 
that investments in brain or body growth do not pay off and the 
individual instead invests in early reproduction. 

To determine if any combination of social-challenge parameters P 
yields an accurate prediction of adult brain and body sizes of H. sapiens 
and closely related species, we obtained solutions exhaustively across 
the P parameter space while holding the other parameters (Q and R) 
fixed (Fig. 3d, e; Supplementary Information 5). We find near-perfect 
adult fits across Homo species (Fig. 4a and Extended Data Figs. 6-8). 
A near-perfect adult fit for H. sapiens occurs with a large proportion of 
ecological challenges (approximately 60%), a moderate proportion of 
cooperative challenges (around 30%), a small proportion of between- 
group competitive challenges (around 10%), and an approximately 
complete absence of between-individual competitive challenges 
(around 0%) (Figs. 3e, 4a and Extended Data Figs. 6, 9). In the resulting 
reconstruction for Homo, ecological challenges increase brain size 
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whereas social challenges decrease it (Extended Data Fig. 4), the pro- 
portion of ecological challenges tends to increase from early to late 
Homo species, and a steep increase in encephalization quotient from 
Homo ergaster to Homo heidelbergensis is due to a transition from 
strongly to weakly decelerating EEE (Fig. 4a). The adult best-fit eco- 
social scenario for H. sapiens also yields a predicted life history that 
closely matches the species life-history timing (Fig. 4b and Extended 
Data Fig. 10). The resulting ontogenetic fit is high for body size, but 
lower for brain size early in ontogeny (Fig. 4b), perhaps caused in part 
by our use of a power-law relationship between resting metabolic rate 
and body mass that underestimates resting metabolic rate early in the 
ontogeny”. With the adult brain size resulting from the best-fitting 
scenario for H. sapiens (xp(t,) = 1.276 kg) , the predicted adult skill 
level for energy extraction is 3.92 terabytes (TB), which can be calcu- 
lated with an equation transforming brain mass to skill level*4 
(&, =5,B,x, (t,) /B, where %,, is the asymptotic skill level in adulthood; 
equation (5) in the Methods). By comparison, current rough estimates”? 
suggest a human-neocortex storage capacity of approximately 600 TB 
(Supplementary Information 4.3). 

Using previously published data”! for parameter estimates, our 
results suggest that adult human-sized brains and bodies may result 
from ecological challenges as drivers of brain expansion, with coop- 
eration and between-group competition decreasing brain and body 
size and between-group competition increasing the encephalization 
quotient by decreasing body size more strongly than brain size (Fig. 3a 
and Extended Data Fig. 4b). In this eco-social scenario, between- 
individual competition is unimportant, as it does not lead to human-sized 
brains and bodies. Cooperation decreases brain size, because it allows 
individuals to rely on their partners’ skills and thus decrease their own 
investment into costly brains (cooperation invites cheating), which is 
consistent with diminished brain sizes in cooperatively breeding birds”° 
and mammals’, including primates”®, For instance, among mole rats, 
naked mole rats are the most specialized in cooperative breeding and 
have the smallest relative brain size”? (however, allomaternal care and 
brain size are positively associated in mammals”, but allomaternal care 
constitutes cooperation targeted at young, which vanishes in adult- 
hood as opposed to the peer-cooperation studied here). Similarly, 
between-group competition can decrease brain size probably because 
between-group competition involves cooperation between group mem- 
bers, allowing individuals to rely on their partners’ skill. The result 
that exceedingly frequent competition decreases absolute and relative 
brain size may be relevant to the observed decreased brain size in ceta- 
ceans with the largest group sizes'®. Cooperation can also decrease 
body size in our model, because when brain size is disfavoured so too 
can be body size. This is because a consequence of our model is that a 
key reason to grow somatic tissue is to make energy available for brain 
growth: increasing the mass of inexpensive somatic tissue can increase 
the energy available for tissue (and brain) growth due to the physical 
constraint imposed by the power-law relationship between resting met- 
abolic rate and body mass”*. 

Overall, our assessment fails to support social hypotheses as expla- 
nations for the evolution of human brain size, and is more consistent 
with ecological hypotheses. Our results suggest causal interpretations 
that differ from some current thinking on the evolution of human cog- 
nition. Specifically, we obtained an eco-social scenario that involves a 
substantial proportion of cooperation (30% against nature and 10% 
against others), which could shape cognition towards cooperation. 
This would help to explain aspects of human cognition that facilitate 
cooperation!!, even if cooperation has not been a driver of human 
brain expansion. Additionally, because our analysis suggests that brain 
expansion in Homo has not been driven by peer cooperation or com- 
petition, our results indicate that social complexity may have had a 
more limited role in human brain size expansion than is commonly 
thought—although we emphasize that our analysis is an illustrative 
starting point and future extensions are encouraged (see Supplementary 
Information 9). Therefore, our results highlight the fundamental ques- 
tion of why ecological challenges would have favoured substantial brain 
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expansion in humans but less so in other taxa!®'®. One clue is suggested 
by our finding that H. sapiens-sized brains and bodies can be obtained 
only under weakly decelerating EEE (Figs. 3, 4 and Extended Data 
Fig. 6a): in other words, only when young individuals can maintain 
a substantial rate of increase in their efficiency of energy extraction 
as they acquire skills. One possibility is that culture (or cumulative 
culture) facilitates weakly decelerating EEE if learning from the pool 
of skills in the population allows individuals to maintain a relatively 
high rate of increase in EEE as their skill level increases when young. 
More specifically, the evolution of progressively elaborated social learn- 
ing, teaching and language'!~'* may have enabled young individuals 
to continue gaining skills with age, possibly promoting less strongly 
decelerating EEE. In this respect, our results are consistent with aspects 
of various cultural hypotheses for brain evolution!*"4 and an explicit 
account of the effect of culture on EEE could help to address whether 
culture (or cumulative culture) has enabled ecological challenges to 
drive brain expansion in humans in ways that have not occurred in 
other taxa. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0127-x. 
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METHODS 


Here we summarize our model; see Supplementary Information for a detailed 
description of the methods. No experiments were performed in this study and no 
empirical data were collected. 

Model description. We consider a female population with overlapping gener- 
ations and partition the body mass of each individual into three tissues: brain, 
reproductive and other somatic tissues. Each individual extracts energy from 
the environment at each time point to grow and maintain her tissues. We 
assume that some of the energy consumed by reproductive tissue is for pro- 
duction and maintenance of offspring, while some of that consumed by brain 
tissue is for production (learning) and maintenance (memory) of domain- 
general, energy-extraction skills. Accordingly, at each age the individual has a 
certain skill level measured in information units (that is, bytes). She extracts 
energy by using her skill level to overcome energy-extraction challenges that 
can be ecological or social. Success in an ecological challenge depends on her 
skill level and on the challenge difficulty, which is determined by the 
(non-evolving) environment. By contrast, success in a social challenge depends 
on her skill level and that of her social partners. We consider three different 
types of social challenge: cooperative challenges, in which the individual’s skill 
level and that of a social partner of the same age (peer) interact to overcome a 
challenge that has a difficulty that is determined by the environment; 
between-individual competitive challenges, in which the individual uses her 
skill level against that of a peer to extract energy, such that the difficulty of 
energy extraction is determined by her competitor’s skill level; and between- 
group competitive challenges, in which the individual's skill level and that of 
a peer act together and against the skills of another two peers (that is, one 
coalition competing against another), which determines the challenge diffi- 
culty. During any small time-interval, the individual faces challenges, a pro- 
portion P; of which are of type j (with j= 1, ..., 4 indexing the four challenge 
types and De = 1). For instance, P,; = 1 denotes that individuals face only 
ecological challenges, whereas P; = P; = 0.5 denotes that individuals only face 
ecological and cooperative challenges and with equal proportions. We define 
the growth-metabolic rate as the rate of heat release by a resting individual due 
to tissue production. Moreover, we define the growth strategy as the fraction 
of the growth-metabolic rate due to the production of each tissue throughout 
life. Thus, the growth strategy generates an ontogenetic profile of brain and 
body size. We consider that the growth strategy evolves by natural selection, 
and study its evolution using standard evolutionary-invasion analysis; that is, 
we consider the increase in frequency by selection of rare genetic mutations 
that control the growth strategy. There is a stable monomorphic female brain 
size in the population when rare mutants of the growth strategy cannot invade 
the population; that is, such resident growth strategy is ‘uninvadable’*!*?. We 
obtain an uninvadable growth strategy using evolutionary-invasion analysis 
for function-valued traits, since the growth strategy is a function of time (age). 
Because skill level depends (though not exclusively) on brain size due to energy 
conservation principles‘, the evolution of brain size causes the evolution of 
skill level. Accordingly, a cooperating partner’s skill level and the difficulty of 
competitive challenges are evolving environments, which constitute the ulti- 
mate distinction between ecological and social challenges in our analysis. This 
evolving environment implements the notion that sociality can yield evolu- 
tionary arms races in cognition as proposed by social hypotheses* !”. 
Energy-extraction efficiency. An important quantity in the model is the individual's 
EEE, defined as the rate of energy extraction divided by the rate of energy extraction if 
the individual is maximally successful at energy extraction. We model the individual's 
EEE as a function of her skill level and that of cooperating or competing peers. To do 
this, we consider two mathematical functions commonly used in contest models: a 
‘power competence’ function that allows for strongly decelerating EEE as the indi- 
vidual gains skills when she is young, and an ‘exponential competence’ function that 
allows for weaker deceleration (Fig. 2b and Extended Data Fig. 1c). We also let the 
skills of cooperating partners interact in an additive, multiplicative or submultiplicative 
(geometric mean) way (the geometric mean is a good descriptor of the average skill in 
the pair if peers have disparate skill levels). Additionally, we assume that if a sufficiently 
young individual fails to overcome a challenge, then she can extract energy from an 
environment facilitated by her mother. 

Parameters. The model has 4 basic parameters, collectively denoted by P, that 
specify the proportion of each social challenge, and the effects of which we 
study here; 13 further parameters, collectively denoted by Q, that measure the 
metabolic costs of the brain and other tissues, the size of the brain and other 
tissues at birth and the demography of the population, for which empirical 
estimates are available; and a final 9 parameters, collectively denoted by R, 
that measure skill metabolic costs, maternal provisioning, mutation size and 
how skill level affects energy extraction, for which we use reasonable values 
given the available data (Fig. 2c; Supplementary Information 4). For example, 
R parameters include the metabolic cost of memory and the values we use for 


this (in megajoules per year per terabyte) fall within an empirically estimated 
range for resting energy consumption for stored motor patterns in cerebellum 
Purkinje cells in rats**. The exact values used for R are chosen within such 
reasonable ranges as they yield a high ontogenetic fit between predicted and 
observed body and brain mass in H. sapiens when there are only ecological 
challenges (that is, P) = 1; Extended Data Fig. 3g, h). This approach is a rea- 
sonable starting point given that the fundamental constraint for a large brain 
is thought to be the metabolic costs of brain, which are incorporated in the 
estimated Q parameters. The values chosen for the R parameters mean that the 
difficulty of ecological challenges is high but not exceedingly so, memory is 
metabolically expensive (although in the low end of the empirically estimated 
range), and skills are moderately effective at overcoming the challenges. Using 
these Q and R parameter values, it was previously shown that ecological chal- 
lenges alone can generate adult brain and body sizes of ancient human scale: 
of late H. erectus scale with strongly decelerating EEE and of Neanderthal scale 
with weakly decelerating EEE”‘. Here we use the same Q and R parameter 
values to study the effects of the social-challenge parameters P. 

Key equations. We assume that the population is large and mostly constituted by 
individuals with a resident growth strategy and by vanishingly rare individuals with 
a mutant growth strategy. At age t, a focal mutant individual has a mass of tissue 
i (for i€ {b, x, s} for brain, reproductive and somatic) of x;(t) (in kilograms) and a 
skill level of x(t) (in terabytes). The growth rate of tissue i € {b, r, s} is 


Brest (t) a S. Bx; (t) 


i€ {b,r,s} 


4) = Fuld (1) 


where x; (t) denotes the derivative of x;(t) with respect to t. The term in square 
brackets is the growth-metabolic rate (B.yn(t)), which equals the resting metabolic 
rate, Bres(), minus the maintenance metabolic rate, >; etbea B,x,(t). The metabolic 
cost of producing (respectively, maintaining) a mass unit of tissue i is E; (respec- 
tively, B;). The growth strategy is the fraction of the growth-metabolic rate due to 
the production of each tissue throughout life, and for the mutant it is denoted by 
u((t) for all t and all i € {b, 1, s} (or u for short). We let the growth strategy be the 
evolving trait. In turn, the mutant skill growth rate is 


Xi (t) = = [s.Brest,b (t) — BX, (1)] (2) 
k 


The brain metabolic rate is B,esp(£), and the metabolic cost of gaining (respectively, 
maintaining) a skill unit is E, (respectively, B,,). The fraction of brain metabolic rate 
allocated to energy-extraction skills is 5. Resting metabolic rate is a power function of 
body mass, xg(f), and a function of EEE, which we denote by e(™(£),yx()): 


Brest (t) = Ke (x, (#),y, (1) xp (0) (3) 


where y;(t) is the skill level at age t of a resident individual. The brain metabolic 
rate is the sum of brain’s maintenance and growth metabolic rates: 


Brest,b (t) = Byxp (t) + Eyxy (t) (4) 


An uninvadable growth strategy u; (t) for all t and all i€ {b, r, s} (or u’) is a best 
response to itself (similar to a Nash equilibrium) regarding the lifetime number of 
offspring it yields*)** (see Evolutionary differential game’). We denote the tissue 
mass and skill level resulting from an uninvadable growth strategy as x; (t) for all 
tand all ic {b, r,s, k} (or x’ for short). 

Switching times. With the parameter values that we use, the uninvadable 
growth strategies typically produce a life history with four critical ages at which 
the growth strategy changes suddenly (called switching times in optimal con- 
trol terminology): the age of brain growth onset tyo, which is when allocation 
to brain growth starts; the age of brain growth arrest ft), when allocation to 
brain growth stops; the age at maturity tm, when allocation to growth of repro- 
ductive tissue starts; and the age at adulthood t,, when allocation to growth of 
non-reproductive tissues stops. These four ages are an output, not parameters, 
of the model. 

Asymptotic skill level. In adulthood (that is, after t,), brain growth is absent and 
when memory is expensive enough skill growth asymptotically ceases”*. 
Specifically, x; (t) = 0 for t>t, and x, (t) tends to 0 as t tends to T, where T is the 
age of menopause. Substituting this and equation (4) into equation (2) yields the 
asymptotic skill level 


A B * 
= Sax (t,) (5) 
k 


Equations for EEE. In Supplementary Information 2.1, we show that EEE can 
be written as 
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4 CG d. 
e(x, (t),y, (t)) = S> P| ~— + —; 6 
a 2 \qt4j gtd,” My 


The term in square brackets (denoted by ej) gives the EEE when facing a challenge 
of type j and is composed of two terms. The first term is the proportion of time that 
the individual succeeds at the challenge, and the second term is the proportion of 
time that the individual fails at the challenge but extracts energy from maternal 
provisioning. The individual’s competence at a type-j challenge is c; and in general 
depends on her own and on her social partners’ skill level. The difficulty of a type-j 
challenge is dj and in general depends on the social and non-social environment 
(that is, on the social partners’ skill level and the constant environment). Because 
the mutant is rare, her social partners are residents. The EEE from maternal pro- 
visioning is p, which decreases with age. 

Using our assumption of domain-general skills, we let the competence function 
be independent of the challenge type, c(G(xi, Vx) = (G;(xk, Yk), where G(x; Vx) is 
a production function describing how the skills of the cooperating partners interact 
(that is, in an additive, multiplicative or submultiplicative way). We consider two 
forms for the competence function: 
G} (m9, with power competence 


(Gm ¥,)) = exp(G(x59,))” 


(7) 


with exponential competence 


where y measures the effectiveness (decidability) of skills at the challenge. The 
production function Gj(x,; yx) is 


Gx V,) 
Xx for j € {1, 3} (8) 
Xt, for j € {2,4} with additive cooperation 
~ |XX My for j € {2,4} with multiplicative cooperation 


XM, for j € {2,4} with sub-multiplicative cooperation 

The difficulty d; of a challenge depends on the challenge type. For an ecological 
or a cooperative challenge, the challenge difficulty is «, which depends on the 
ecological environment, which we assume is constant (this assumption can be 
relaxed in future extensions; Supplementary Information 9). In turn, the difficulty 
of a competitive challenge depends on the skill level of the individual's competitors. 
Since the mutant is rare, a mutant’s competitors are residents, so the difficulty ofa 
competitive challenge is the competence of the resident, c(G;(y,, yx)). In general, 
the difficulty of a type-j challenge is 


_ ja for j € {1,2} 
4O)=1eGiy,.y)) for ie {3,4} (9) 


We let the EEE from maternal provisioning when the individual is of age t be 


p(t) = poexp(— ¢t) (10) 


where { is the EEE from maternal provisioning at birth and y, measures the rate 
of decrease of maternal provisioning. The resulting equations for e; for all cases 
considered can be found in Supplementary Information 2.3. 
Evolutionary differential game. Let R(u, v) be the expected lifetime number of 
offspring of a mutant with growth strategy u when the resident growth strategy is v. 
We assume that the population is kept at a constant size due to density-dependent 
competition through fertility rather than survival. Hence, an uninvadable growth 
strategy u° maximizes the mutant’s expected lifetime number of offspring when u* 
is resident™. That is, an uninvadable growth strategy uv" satisfies 
u’ € argmax Ro(u, u’) (11) 
ueU 
where U is the set of feasible growth strategies. Assuming that the mortality rate ju 
is constant (which can be relaxed in future extensions; Supplementary 
Information 9) and that reproductive tissue is narrowly defined so that it is not 
involved in offspring maintenance (for example, defined as preovulatory ovarian 
follicles), the mutant’s expected lifetime number of offspring when v is resident is 
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Ro(u v) « { exp(— put)x,(t)dt (12) 
0 
Thus equation (11) poses a differential game problem: it is a ‘game’ between mutant 
and resident because the mutant’s payoff (Ry(u, v)) depends on the resident strat- 
egy, it is ‘differential because it depends on differential equations (equations (1) 
and (2)), and it is ‘evolutionary’ rather than a typical differential game because only 
the mutant’s payoff is maximized rather than both the mutant and resident's pay- 
offs. Because equation (11) involves maximization with respect to functions (u) 
rather than points, this maximization poses an optimal control problem. We solve 
this problem numerically by finding a best response to the resident (optimal con- 
trol problem), setting the best response as the resident, and iterating until conver- 
gence to a point at which the mutant and resident strategies are indistinguishable 
to a chosen extent. To do so, we use the software GPOPS”. 
Figure specifications. For Fig. 3a, b, plots are around only ecological challenges; 
that is, for a given plot, the remaining two P;s are set to zero. For social challenges, 
the arrows in Fig. 3c describe the qualitative effect determined in Fig. 3a, b of 
increasing the proportion of a social challenge as the proportion of ecological 
challenge decreases; for ecological challenges, the arrows describe the quali- 
tative effect of increasing the environmental difficulty a as found in Extended 
Data Fig. 3g, h. The patterns in Fig. 3a—c also hold around the best-fitting P* for 
H. sapiens; that is, when for a given plot, the remaining two P;s are set to the values 
of P* (Extended Data Fig. 4b, c). The ‘missing’ dots in Fig. 3d are P; combinations 
that did not converge to an uninvadable growth strategy (for example, owing to 
cycling solutions, suggesting possible evolutionary branching (female dimorphism) 
in brain size) or that were unreachable from lack of convergence of nearby runs 
(Supplementary Information 5). For Fig. 3a—e, cooperation is submultiplicative and 
for Fig. 3d, e, competence is exponential (see Extended Data Fig. 4 for all cases). 
Figure 4a shows the hominin species for which we find a near-perfect adult fit (that 
is, for which the best adult fit is greater than the chosen threshold of —D(T,) = —0.05; 
Extended Data Figs. 6-8). For Fig. 4a, cooperation is submultiplicative (respectively 
additive) for weakly (respectively strongly) decelerating EEE. In Fig. 4b, dots are the 
values for an average H. sapiens female as previously reported”!. The resulting life 
periods in Fig. 4b are defined as ‘childhood; when there has not been allocation to 
production of reproductive tissue from birth; ‘adolescence’ when there is allocation to 
production of somatic and reproductive tissues; and ‘adulthood; when there is only 
allocation to production of reproductive tissue. The EEE from maternal provisioning 
at birth (part of the R parameters) in Fig. 4b is slightly smaller than its benchmark value 
to improve ontogenetic fit further without affecting adult fit (ontogenetic fit is 
—E(D(7)) = —0.22 using Q = 0.5 rather than —0.33 using the benchmark yp =0.6; 
Supplementary Information 6, 8). 
Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 
Code availability. Code that supports the findings of this study is available in 
Zenodo with the identifier https://doi.org/10.5281/zenodo.1197479. 
Data availability. Data of predicted brain size, body size and skill level for the 
various challenge combinations as generated by this study and as used for Figs. 3, 4 
and Extended Data Fig. 4, 6-8 have been deposited in Zenodo with the identifier 
https://doi.org/10.5281/zenodo,.1197479. Complete numerical solutions, including 
growth strategies across the parameter sweep totalling 200 GB of data have been 
deposited in Zenodo with the identifier https://doi.org/10.5281/zenodo.1217123. 
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Extended Data Fig. 1 | Shape of EEE versus skill. a—c, Plots of EEE (e), its birth is 1 for power competence but 0 for exponential competence. 
speed and acceleration with respect to skill level under power competence a, b, e and its speed at birth and during young ages are smaller for 


and exponential competence with only ecological challenges (that is, exponential competence than for power competence. c, However, the 
P,=1) for the parameter values used and without maternal provisioning acceleration in e at birth and at young ages is larger for exponential 
(that is, ¢ = 0, so that e= S,). For comparison, the curves for power competence than for power competence. 
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Extended Data Fig. 2 | See next page for caption. 
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Extended Data Fig. 2 | Method implementation. a, Typical result with 
convergence to an uninvadable growth strategy. For the ith best-response 
iteration, the growth strategy shown is the resident (v) whose best 
response (77) is shown next, which is the resident of the i+ 1th iteration. 
Convergence to a best response to itself (u") was declared visually, in this 
case, at iteration 21. b-f, Reporting variables across the best-response 
iterations in a. b-e, Resulting adult body mass, brain mass, skill level and 
encephalization quotient across iterations. These values tend to converge 
more quickly than the growth strategy (a). f, Rather than visually declaring 
convergence, convergence should ideally be declared when the difference 
between mutant and resident is below a chosen threshold. However, 
numerical jittering prevented the use of this criterion. For example, f 
shows the maximum of |f#*(t)—v(t)| across t for each iteration in 

a. Without numerical jittering, this maximum should decrease as the 
growth strategy approaches a best response to itself. However, numerical 
jittering causes this maximum to be at least equal to the maximum 
mutation size 6= 0.1. The maximum of |f*(t)—v(t)| is occasionally greater 


than 6 because i” and v have different partitions over t and we use the 
following approximation: for each tin the ¢ partitioning of a", we find the 
closest t in the ¢ partitioning of v and calculate the difference |a@*(t)—v(t)| 
at these relatively close times; this may occasionally cause the difference to 
be the larger than 6 when strategies change suddenly with t. Alternative 
measures of convergence were similarly inadequate (for example, 

x, |a@*()—v(t)|). g, We implement maternal provisioning differently than 
before”* to incorporate it when there are social challenges. The difference 
yields no detectable difference in predicted brain and body mass with only 
ecological challenges after slightly adjusting the EEE from maternal 
provisioning of a newborn (go): before”4, (79 = 0.6 for power and yo = 0.8 
for exponential competence were used; here, 9 = 0.4 for power and 

(yo = 0.6 for exponential competence were used. h, Three ways to measure 
adult fit: (1) at the predicted age of adulthood (x3 (t,) —Xp (t,)); (2) at the 
observed age of adulthood (x3(7,)—X,(7,)); and (3) at the predicted age of 
adulthood for the prediction and at the observed age of adulthood for the 
observation (xp (t,) —Xp(7)). We use option 2. 
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Extended Data Fig. 3 | See next page for caption. 
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Extended Data Fig. 3 | Effects of Q and R parameters. a, b, Effects of 
maintenance costs (B;) on the corresponding tissue mass or skill level. 
Each B; tends to decrease the value x; (7,) for the corresponding i, but not 
necessarily for the other i (see ¢, d). c, d, Effect of B; on adult brain mass, 
body mass and encephalization quotient. With power competence (c), 
when B, =310 and 340 MJ kg“! per year (y), the predicted adult brain 
mass is x,(7,) = 1.0298 and 0.9133 kg, respectively. With exponential 
competence (d), when By = 310, 340 and 370 MJ kg~! y~', the predicted 
adult brain mass is x1(7,) = 1.542, 1.3973 and 1.2767 kg, respectively. 

e, f, Effects of B, when B, is small. When B, varies between 70 and 

2,700 MJ kg~! y~!, B, has no detectable effect on adult brain mass and 
encephalization quotient. g, h, Ontogenetic fit with H. sapiens around the 
used values for each of the R parameters (except 5). The ontogenetic fit is 
approximately maximized around the benchmark values chosen 
previously”4, which are also used here (except for given our improved 


implementation of _). i, Effect of B, on the predicted life history with 
exponential competence. In the left column, from top to bottom, as B, 
decreases, the allocation to the growth of reproductive tissue during 
adolescence increases (u, between f and t,) and adolescence shortens. In the 
central column, the increased allocation to the growth of reproductive tissue 
increases the mass of reproductive tissue, but brain mass does not change with 
B, for B, > 70 MJ kg! y~1. In the right column, as the mass of reproductive 
tissue increases, body mass increases slightly, which is more noticeable 

for B,< 100 MJ kg! y~’. An exceedingly small B, (<70 MJ kg! y~') 

disrupts the predicted life history, which with B, = 60 MJ kg~! y~' is 
severely different from that of H. sapiens (for example, there is brain 
growth late in life and reproductive growth from birth). Similar results 
arise for even smaller B,. In a-i there are only ecological challenges and we 
use the previous”! definition of yp. 
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Extended Data Fig. 4 | Effects of challenge types on brain size. 

a, b, Outer rows are for the cooperation cases that were considered; outer 
columns are for the competence cases. a, Around the pure ecological 
scenario (that is, in a given plot for P; as P; decreases, the remaining two 
P7s are set to zero). b, Around the best fitting scenario for H. sapiens (that 
is, in a given plot for P; as P; decreases, the remaining two P;s are set to 
the best fitting P* found in Fig. 3d. c, Summary of the qualitative effects 
of challenge types on brain size. For social challenges, the direction of 
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the arrows is taken from a, b. For ecological challenges, the direction 

of the arrow is taken from Extended Data Fig. 3g as the environmental 
difficulty a increases. A dash (—) indicates an approximately invariant 
relationship and a dot (-) indicates insufficient data points for identifying 
a relationship. The arrows in Fig. 3c are taken from this summary, in 
which, for social challenges, the arrows are those of submultiplicative 
cooperation. AC, additive cooperation; EC: exponential competence; MC, 
multiplicative cooperation; SC, submultiplicative cooperation. 
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Extended Data Fig. 5 | Typical results when there is convergence to 
no brain growth or when there is no convergence to an uninvadable 
growth strategy. a—e, Adult values over best-response iterations for 
cases of no brain growth or no convergence to an uninvadable strategy. 
a, Amplifying cycle leads to no brain growth. b, Stable cycle. c, Arms 
race that ends when the solver warns that the optimal control problem 
(OCP) may be infeasible. This might arise if the best response to the 
last iteration necessarily involves a substantially different growth 
strategy, which is not allowed in the optimization as the best response is 
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constrained to be sufficiently similar to that in the previous iteration. It 

is possible that such substantially different best response involves either 

no brain growth (for example, as seen under purely ecological challenges 
when the environmental difficulty is exceedingly high”* (Supplementary 
Information 4.4)) or substantially more allocation to brain growth (which 
appears unlikely given the energetic constraints). d, A short arms race in 
encephalization quotient that leads to no brain growth. e, Amplifying cycle 
that ends when the solver warns that the OCP may be infeasible. 
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all cases considered. Each dot’s colour gives the adult fit, —D(r,), for 
the corresponding parameter combination and case. a, H. sapiens. 


b, H. neanderthalensis. c, H. erectus. 
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ing scenarios 
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across hominins. Adult fit of predicted adult brain and body mass 
with those observed in a given species across parameter values for 


Extended Data Fig. 6 | Identification of best-fitt 
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Extended Data Fig. 7 | Identification of best-fitting scenarios across hominins, continued. See legend of Extended Data Fig. 6 for details. 


a, H. heidelbergensis. b, H. ergaster. c, H. habilis. 
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Extended Data Fig. 9 | High fit intervals for best-fitting scenarios across 
hominins. Here we show high fit intervals around the best fitting 
scenarios across hominins having a best adult fit greater than —0.05. 

a, c, e, g, i, k, For the top left plot, as P; increases, P, decreases, whereas for 
the remaining plots as P2, P3 and P, increase, P; decreases; for a given plot, 
the remaining P; are set to the corresponding P* shown in Fig. 4a (that is, 
plots are around P*). The dots are the adult fit and the lines are 
interpolated values using a monotone Hermite spline (splinefun with 


method monoH.FC in R). The red line is —D(7,) = —0.05. b, d, f, h, j,.1, 
The whiskers are the high fit intervals where adult fit is greater than —0.05 
and the dots are the estimated P’ giving the best adult fit for the species in 
the interpolation. The cases of competence and cooperation are as found 
in Extended Data Figs. 6, 7. Note that for H. habilis, the high fit intervals 
may be wider as the adult fit is increasing at the end of the values of P2, P3 
and P, for which uninvadable growth strategies were obtained. 
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Extended Data Fig. 10 | Detailed life history resulting from the best- 
fitting scenario for H. sapiens. Plots correspond to Fig. 4b. a, The growth 
strategy generating the life history. b, The resulting growth metabolic rate. 
c, d, The mass of all tissues. e, The skill level. For comparison with the 
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Cellular milieu imparts distinct pathological 
a-synuclein strains in a-synucleinopathies 


Chao Peng, Ronald J. Gathagan, Dustin J. Covell, Coraima Medellin, Anna Stieber, John L. Robinson, Bin Zhang, Rose M. Pitkin, 
Modupe F. Olufemi, Kelvin C. Luk, John Q. Trojanowski & Virginia M.-Y. Lee* 


In Lewy body diseases—including Parkinson’s disease, without 
or with dementia, dementia with Lewy bodies, and Alzheimer’s 
disease with Lewy body co-pathology!—a-synuclein (a-Syn) 
aggregates in neurons as Lewy bodies and Lewy neurites”. By 
contrast, in multiple system atrophy «-Syn accumulates mainly 
in oligodendrocytes as glial cytoplasmic inclusions (GCIs)’. 
Here we report that pathological «-Syn in GCIs and Lewy bodies 
(GCI-c-Syn and LB-a-Syn, respectively) is conformationally and 
biologically distinct. GCI-a-Syn forms structures that are more 
compact and it is about 1,000-fold more potent than LB-c-Syn in 
seeding a-Syn aggregation, consistent with the highly aggressive 
nature of multiple system atrophy. GCI-a-Syn and LB-a-Syn show 
no cell-type preference in seeding «-Syn pathology, which raises the 
question of why they demonstrate different cell-type distributions 
in Lewy body disease versus multiple system atrophy. We found that 
oligodendrocytes but not neurons transform misfolded a-Syn into a 
GCI-like strain, highlighting the fact that distinct a-Syn strains are 
generated by different intracellular milieus. Moreover, GCI-a-Syn 
maintains its high seeding activity when propagated in neurons. 
Thus, c-Syn strains are determined by both misfolded seeds and 
intracellular environments. 

The diverse nature of «x-synucleinopathies suggests that they may be 
caused by distinct a-Syn strains**. To investigate whether GCI-a-Syn 
and LB-a-Syn represent two distinct strains, sarkosyl-insoluble a-Syn 
was isolated from the brains of patients with multiple system atrophy 
(MSA), which exists as two subtypes—the Parkinsonian subtype (MSA- 
P) and the cerebellar subtype (MSA-C)* !°—and brains from patients 
with Lewy body disease (Extended Data Fig. 1a and Supplementary 
Tables 1, 2). First we evaluated the extent of Ser129 phosphorylation 
(pS129), which is a hallmark of pathological a-Syn'» 12 on GCI-a- 
Syn and LB-a-Syn, and found much less pS129 on GCI-a-Syn than 
on LB-a-Syn (Fig. 1a, b). Second, we analysed conformational differ- 
ences between GCI-a-Syn and LB-a-Syn using proteinase K diges- 
tion. Proteinase K digestion shows predominantly undigested a-Syn 
for GCI-a-Syn (1st band in Fig. 1c), whereas LB-c-Syn was cleaved into 
smaller fragments (2nd—4th bands in Fig. 1c). The relative resistance 
of GCI-a-Syn to proteinase K digestion was further confirmed using 
increasing concentrations of proteinase K (Fig. 1d), which indicates 
that GCI-a-Syn might form a more-compact structure than LB-a-Syn. 
Epitope mapping showed that the 2nd band after proteinase K diges- 
tion was truncated mainly at the N terminus, whereas the 3rd and 4th 
bands were truncated mainly at the C terminus (Extended Data Fig. 1b, 
c). GCI-a-Syn and LB-«-Syn also produced distinct banding patterns 
when digested with trypsin or thermolysin, further demonstrating their 
different conformations (Extended Data Fig. 1d-g). 

To confirm that GCI-a-Syn and LB-a-Syn have different conforma- 
tions, we immunostained sections of diseased brains with the monoclo- 
nal antibody (MAb) Syn7015, which is selective for a synthetic a-Syn 
strain’? At low concentrations Syn7015 preferentially recognized 
GCIs over Lewy bodies, whereas another MAb Syn303 that detects 


pathological o-Syn'* '4 immunostained GCIs and Lewy bodies equally 


well (Extended Data Fig. 2a, b). Semi-quantitative analyses of Lewy 
bodies and GCIs!>!° stained by Syn7015 or Syn303 on adjacent sec- 
tions showed that Syn7015 preferentially recognized GCIs over Lewy 
bodies (Fig. le), which was also supported by the ratio of total area 
occupied by Syn7015-positive over Syn303-positive pathology (Fig. 1f 
g and Extended Data Fig. 2c). This preferential recognition further 
demonstrates that there are conformational differences between GCI- 
a-Syn and LB-a-Syn. 

To determine whether structural differences between GCI-a-Syn 
and LB-a-Syn influence their seeding activities, we treated primary 
oligodendrocytes expressing a-Syn with an equal amount of GCI-a- 
Syn, LB-c-Syn or «-Syn preformed fibrils (PFFs)!”. GCI-a-Syn is much 
more potent than LB-c-Syn and PFFs at seeding a-Syn aggregation in 
oligodendrocytes (Fig. 1h, iand Extended Data Fig. 3a, b). Specifically, 
GCI-a-Syn is approximately 1,000-fold more potent than LB-a-Syn 
and PFFs (Fig. 1j): 30 ng of LB-a-Syn induced a similar amount of 
pathology as 30 pg of GCI-a-Syn, and 301g of PFFs were comparable 
to 30 ng of GCI-a-Syn. The purity of the oligodendrocyte cultures and 
the presence of «-Syn pathology in oligodendrocytes were confirmed 
by immunofluorescence staining (Extended Data Fig. 3c-f). 

Because the high potency of GCI-ca-Syn in inducing oligodendrocyte 
pathology is consistent with the distribution of this strain in oligoden- 
drocytes of patients with MSA, we hypothesized that the properties of 
GCI-a-Syn and LB-a-Syn dictate their differential cell-type distribu- 
tion in patients (hypothesis 1 in Extended Data Fig. 3g), and speculated 
that LB-a-Syn is more potent than GCI-c-Syn in inducing neuronal 
pathology. Primary neurons treated with GCI-a-Syn developed many 
more a-Syn inclusions than those treated with LB-a-Syn or PFFs 
(Fig. 2a, b and Extended Data Fig. 4a, b), and GCI-a-Syn also was about 
1,000-fold more potent than LB-a-Syn or PFFs in inducing neuronal 
a-Syn pathology (Fig. 2c, d). Furthermore, using a QBI-293 cell line 
expressing human a-Syn (hereafter, ‘QBI-WT-Syn cells)'®, we con- 
firmed that GCI-a-Syn is about 1,000-fold more potent than LB-a-Syn 
and PFFs (Fig. 2e, fand Extended Data Fig. 4c-e). To rule out possible 
contributions of contaminating proteins, GCI-a-Syn and LB-a-Syn 
were further purified by immunoprecipitation. Immunoprecipitation- 
purified GCI-a-Syn maintained its markedly higher seeding ability 
compared to LB-a-Syn (Extended Data Fig. 4f). Moreover, the addition 
of PFFs to immunoprecipitation-depleted GCI-a-Syn preparations did 
not increase the seeding ability of PFFs (Extended Data Fig. 4g, h). 

Because GCI-a-Syn is more resistant to proteinase K digestion 
(Fig. 1c, d), we investigated whether the high potency of GCI-a-Syn is 
due to its resistance to degradation. Previously we showed that exoge- 
nously added misfolded «-Syn accumulates in lysosomes and that treat- 
ing the cells with chloroquine (a lysosomal inhibitor) could increase the 
amount of a-Syn pathology that was induced’. To test this hypothe- 
sis, primary neurons seeded with GCI-a-Syn, LB-a-Syn or PFFs were 
treated with chloroquine. Chloroquine treatment similarly increased 
the pathology induced by GCI-a-Syn, LB-ca-Syn and PFFs (Extended 
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Fig. 1 | GCI-a-Syn and LB-a-Syn represent two distinct strains. 

a, GCI and Lewy body immunoblotted with antibodies against total 
a-Syn or pS129 a-Syn (pSyn). b, Quantification of pS129 a-Syn versus 
total a-Syn shown in a (GCI, n=5 cases; LB, n=7 cases). c, Proteinase 
K-digested LB-c-Syn and GCI-a-Syn from six cases immunoblotted 
with anti-a-Syn MAb (Syn211). d, GCI-a-Syn and LB-a-Syn incubated 
with increasing concentrations of proteinase K (PK) and immunoblotted 
with Syn211 (experiment repeated three times). e, Semi-quantitative 
scores (0-3) to quantify a-Syn pathology revealed by Syn303 or Syn7015 
immunohistochemistry in adjacent brain sections of patients with MSA 
or Lewy body disease (LB, n = 9 cases; GCI, n =7 cases) (statistics: 
Mann-Whitney U-test). f, Quantification of area occupied by Syn7015- 
positive (Syn7015*) versus Syn303-positive (Syn303+) a-Syn pathology 
for experiments in e (LB, n =9 cases, GCI, n=7 cases). g, Representative 
photomicrographs for experiments in e (repeated with seven cases). 


Data Fig. 4i, j), which suggests that the high potency of GCI-a-Syn is 
not likely to be due to its resistance to lysosomal proteolysis. 

The markedly different seeding abilities of GCI-a-Syn and LB-a- 
Syn in vitro prompted us to test their properties in vivo. Equivalent 
amounts of GCI-«a-Syn or LB-a-Syn were injected into the striatum 
of wild-type mice”®. At three months post-injection, only GCI-a- 
Syn-injected wild-type (GCI-WT) mice developed abundant neu- 
ronal inclusions (Fig. 2g). At six months post-injection, although 
the amount of a-Syn pathology in GCI-WT mice declined mark- 
edly, only a limited number of neurons developed a-Syn pathology 
in LB-a-Syn-injected wild-type (LB-WT) mice (Extended Data 
Fig. 4k). Therefore, GCI-a-Syn more-potently induces neuronal 
pathology in vivo, which also argues against cell-type specific seed- 
ing by GCI-a-Syn. Furthermore, no oligodendroglial pathology was 
detected in GCI-WT mice, which further argues against the hypoth- 
esis that the properties of GCI-a-Syn dictate its oligodendrocyte 
distribution. On the other hand, the high potency of GCI-a-Syn 
probably contributes to the aggressive nature of MSA. 

Misfolded a-Syn was shown to spread through the neuroanatomical 
connectome, but how the properties of different a-Syn strains affect this 
spreading process is unclear. We compared the transmission pattern of 
a-Syn pathology in wild-type mice injected with GCI-a-Syn, LB-a-Syn 
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h, Primary oligodendrocytes expressing «-Syn-mCherry incubated with 
13 ng GCI-a-Syn, LB-a-Syn or PFFs were stained with 81A (pS129 a-Syn) 
and anti-olig2 (experiment repeated four times). i, Quantification of pS129 
a-Syn induced by GCI-a-Syn, LB-a-Syn and PFF in oligodendrocytes 
expressing «-Syn (GCI, n=8 different preparations; LB, n= 9 different 
preparations) (statistics: two-tailed unpaired t-test using the mean value of 
each case). j, Quantification of p$129 a-Syn induced by various amounts 
of PFFs, GCI-a-Syn or LB-c-Syn in oligodendrocytes expressing a-Syn 
(n=6 (LB), 4 (GCI 3 ng) or 5 (all other groups) biological replicates) 
(statistics: adjusted with Bonferroni correction). Results shown as 

mean + s.e.m. *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001; NS, 
not significant. Scale bars: 100 1m (g, h), 251m (g inset), 101m (h inset). 
AD, Alzheimer’s disease; DLB, dementia with Lewy bodies; LB, Lewy 
body; PDD, Parkinson's disease with dementia. For gel source data, see 
Supplementary Fig. 1. See Supplementary Table 5 for statistical details. 


or PFFs. GCI-a-Syn spread more efficiently to the entorhinal cortex, 
hippocampus and pyramidal layer of piriform cortex, but less effi- 
ciently to other cortical regions such as the motor cortex, as compared 
to PFFs and LB-a-Syn (Fig. 2h, i and Extended Data Figs. 4], 5a, b). 
By contrast, PFFs induced relatively more pathology in the striatum 
and substantia nigra but less in the olfactory bulb, as compared with 
LB-a-Syn and GCI-a-Syn (Fig. 2h, iand Extended Data Figs. 41, 5a, b). 
Therefore, our data demonstrate that different «-Syn strains markedly 
modulate their transmission patterns in the nervous system. 

The observation that the seeding properties of GCI-a-Syn and LB-a- 
Syn do not show any cell-type preference raises the question of why they 
demonstrate distinct cell-type distributions in diseased brains. Because 
our data argue against the hypothesis that strain properties determine 
cell-type distributions (hypothesis 1 in Extended Data Fig. 3g), we pro- 
posed an alternative hypothesis that the different cellular environments 
of neurons and oligodendrocytes promote the formation of distinct 
a-Syn strains (hypothesis 2 in Extended Data Fig. 3g). According to 
this hypothesis, injection of LB-a-Syn into mice expressing a-Syn in 
oligodendrocytes should convert the LB-c«-Syn into a GCI-like strain. 
To test this hypothesis and to eliminate confounding factors arising 
from neuronal pathology, we crossed 2’,3'-cyclic nucleotide 3’ phos- 
phodiesterase (CNP)-a-Syn transgenic mice (M2 line)”! with a-Syn 
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Fig. 2 | The seeding properties of GCI-a-Syn and LB-a-Syn do 

not show any cell-type preference. a, p$129 a-Syn induced by 1 ng 

of GCI-a-Syn, LB-ca-Syn or PFFs in primary neurons (experiment 
repeated seven times). b, Quantification of a-Syn pathology for 
experiments in a (GCI, n=8 different preparations, LB, n= 9 different 
preparations) (statistics: two-tailed unpaired t-test using the mean value 
of each case.). c, d, Quantification of p$129 a-Syn induced by various 
concentrations of PFFs, GCI-a-Syn or LB-a-Syn in neurons (n = 3 
biological replicates) (statistics: adjusted with Bonferroni correction). e, 
pS129 a-Syn induced by 2 ng of GCI-a-Syn, LB-a-Syn or PFFs in 
QBI-W'T-Syn cells (experiment repeated seven times). f, Quantification 
of pS129 a-Syn induced by various concentration of PFFs, GCI-a-Syn or 
LB-a-Syn in QBI-WT-Syn cells (LB, n =7 (LB), 4 (GCI) or 3 (PFF) 


knockout mice, generating a new mouse line that expresses a-Syn only 
in oligodendrocytes (hereafter referred to as ‘KOM2 mice’) (Extended 
Data Fig. 6a, b). 
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biological replicates) (statistics: adjusted with Bonferroni correction). 

g, Quantification of the number of cells with a-Syn pathology in wild-type 
mice inoculated with 50 ng of GCI-a-Syn or LB-«-Syn at three months 
post-injection (m.p.i.) (1 =3 mice). h, i, Quantification of the distribution 
of a-Syn pathology seeded by GCI-a-Syn (GCI-WT, n= 3 mice) or PFFs 
(PFF-WTT, n=4 mice) at three months post-injection and LB-a-Syn 
(LB-WT, n=3 mice) at six months post-injection. Amg, amygdala; Cortex, 
cortex except pyramidal layer of piriform area (PIR2) and entorhinal 
cortex (Ent); Hippo, hippocampus; OB, olfactory bulb; SN, substantia 
nigra; Str, striatum. Results shown as mean + s.e.m. Statistics shown in 

h, i are two-way ANOVA followed by Tukey’s honest significant difference 
test. Scale bars: 100 1m (a), 50|1m (e). See Supplementary Table 5 for 
statistical details. 


KOM2 mice were unilaterally injected with GCI-a-Syn or LB-a-Syn 
into the thalamus (GCI-KOM2 and LB-KOM2 mice, respectively). At 
one month post-injection, although a-Syn inclusions appeared in the 
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Fig. 3 | Oligodendrocyte environment generates the GCI-ca-Syn strain. 
a, Syn506-positive a-Syn aggregates seeded by 18.75 ng of GCI-a-Syn 

or LB-a-Syn in KOM2 mice. OPT, optic tract; CP, cerebral peduncle 
(experiment repeated three times). b, Quantification of the number of 
oligodendrocytes with a-Syn pathology in injected KOM2 mice (one 
month post-injection, m = 3; three and six months post-injection, n =5 
mice) (statistics: adjusted with Bonferroni correction). c, Quantification 
of the ratio of area occupied by Syn7015-positive versus Syn303-positive 
a-Syn pathology in adjacent sections of brains from patients with MSA 
(MSA) or patients with Lewy body disease (LBD), injected KOM2 mouse 


brains or M83 transgenic mouse brains (GCI, n =3 cases; LB, n=3 

cases; GCI-KOM2, LB-KOM2 and M83, n= 4 mice) (statistics: one-way 
ANOVA followed by Dunnett’s post hoc test comparing each group with 
LB-KOM2). d, Adjacent sections of the medulla from an MSA case stained 
with Syn303 and Syn7015 (repeated with four cases). Arrows: GCIs; 
arrowheads: neuronal inclusions. e, Quantification of the ratio of Syn7015- 
positive versus Syn303-positive GCIs and neuronal inclusions (NI) on 
adjacent sections of brains from paients with MSA (n =6 cases). Results 
shown as mean + s.e.m. Scale bars: 50 jum (a); 12.5 1m (a insets); 100 zm 
(d); 10 um (d insets). See Supplementary Table 5 for statistical details. 
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Fig. 4 | Oligodendrocytes convert misfolded a-Syn to a GCI-a-Syn- 
like strain but neurons could not convert GCI-c-Syn to a LB-a-Syn- 
like strain. a, Schematic for passaging PFFs in KOM2 mice. b, a-Syn 
pathology in PFF-injected KOM2 mice (experiment repeated three 
times). c, Quantification of pS129 a-Syn in QBI-WT-Syn cells seeded 
by PFFs, passaged PFFs (PFF-KOM2-Syn) or PFFs combined with 
insoluble fraction from uninjected KOM2 mice (PFF + KOM2) (n=6 
(PFF-KOM2-Syn), 3 (PFF) and 5 (PFF + KOM2) biological replicates). 
d, Quantification of pS129 a-Syn in QBI-WT-Syn cells seeded by equal 
amounts of LB-a-Syn or LB-a-Syn passaged in KOM2 mice (LB-KOM2) 
(LB, n =3 biological replicates; LB-KOM2, n=7 biological replicates). 

e, Schematic for passaging PFFs in various cells. f, Quantification of p$129 
a-Syn in QBI-WT-Syn cells seeded by PFFs passaged in cells. PFF-oligo- 
Syn, PFF-HipN-Syn, PFF-CtxN-Syn and PFF-QBI-Syn refer to PFFs that 
have been passaged in oligodendrocytes, hippocampus neurons, cortical 
neurons and QBI-WT-Syn cells, respectively. (n = 6 (PFF-oligo-Syn), 

4 (PFF-HipN-Syn, PFF-CtxN-Syn) and 3 (PFF) biological replicates). 

g, Schematic for generating a-Syn PFFs in cell lysates. h, Quantification 
of pS129 a-Syn in QBI-WT-Syn cells seeded by PFFs generated in 
oligodendrocyte lysate (oligo-PFF), cortex neuron lysate (CtxN-PFF), 


thalamus and fimbria of GCI-KOM2 mice, more abundant pathology 
was observed in the optic tract and cerebral peduncle; that is, sites 
distant from the injection site but highly enriched with oligodendro- 
cytes, which suggests that a-Syn pathology spreads between oligoden- 
drocytes. By contrast, few oligodendrocytes developed pathology in 
LB-KOM2 mice, demonstrating that GCI-a-Syn is more potent than 
LB-a-Syn. However, at three months post-injection—as the burden 
of pathology in GCI-KOM2 mice peaked—LB-KOM2 mice began to 
show substantial oligodendrocyte pathology. At six months post-in- 
jection, whereas the pathology in GCI-KOM2 mice declined precipi- 
tously, LB-KOM2 mice showed even more inclusions, which reached 
levels comparable to GCI-KOM2 mice at three months post-injec- 
tion (Fig. 3a, b and Extended Data Fig. 7a—e). The presence of a-Syn 
pathology in oligodendrocytes and the phosphorylation of S129 were 
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hippocampus neuron lysate (HipN-PFF) or with a-Syn monomer alone 
(PFF) (n=6 (oligo-PFF) or 4 (CtxN-PFF, HipN-PFF, PFF) biological 
replicates). i, Schematic for passaging GCI-c-Syn in primary neurons. 

j, Quantification of pS129 a-Syn in QBI-WT-Syn cells seeded by a-Syn 
PFFs, GCI-a-Syn and GCI-«-Syn passaged in primary neurons (n = 3 
(GCI, PFF), 4 (GCI-N-P2, GCI-N-P3) or 5 (GCI-N-P1) biological 
replicates). k, Proteinase K-digested LB-a-Syn, GCI-a-Syn, GCI-N-P1, 
GCI-N-P2 and GCI-N-P3 were immunoblotted with anti-a-Syn antibody 
(experiment repeated three times). 1, Quantification of insoluble p$129 
a-Syn in QBI-WT-Syn cells seeded by equal amounts of GCI-a-Syn, 
LB-a-Syn or GCI-a-Syn and LB-a-Syn passaged in M83 mice (GCI-M-P1 
and LB-M-P1) (n=8 (GCI, LB) or 6 (GCI-M-P1, LB-M-P1) biological 
replicates). Results shown as mean +s.e.m. Statistics shown in ¢, f, h and j 
are one-way ANOVA followed by Dunnett’s post hoc test comparing each 
group with PFF-KOM2-Syn (c), PFF-oligo-Syn (f), oligo-PFF (h) or GCI 
200 pg (j). Statistics shown in l are one-way ANOVA followed by Tukey's 
multiple comparison's test. Scale bars: 1 mm (b); 501m (b, middle inset), 
15m (b, right inset). For gel source data, see Supplementary Fig. 1. See 
Supplementary Table 5 for statistical details. 


confirmed by immunostaining (Extended Data Fig. 7f, g). Mapping 
of a-Syn pathology in LB-KOM2 and GCI-KOM2 mice revealed a 
similar pattern (Extended Data Fig. 8). This delayed induction of a-Syn 
pathology in LB-KOM2 mice supports our hypothesis that LB-a-Syn 
is less potent than GCI-a-Syn, but that once initiated the subsequent 
propagation within oligodendrocytes results in the formation of a 
GCI-a-Syn-like strain. 

Moreover, whereas Lewy bodies in the brains of patients with 
Parkinson's disease were not detected by Syn7015, the oligodendro- 
cyte a-Syn pathologies induced by LB-a-Syn in KOM2 mice were 
Syn7015-positive (Fig. 3c and Extended Data Fig. 9a), suggesting that 
LB-a-Syn-induced oligodendrocyte pathology acquired properties 
of the GCI-a-Syn strain. Furthermore, detailed characterization of 
oligodendrocyte pathologies induced by GCI-a-Syn or LB-a-Syn 
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showed that they were indistinguishable, including co-localization with 
p62, partial co-localization with ubiquitin and association with reac- 
tive astrocytes (Extended Data Fig. 9b, c). Thus, LB-a-Syn can induce 
oligodendrocyte pathologies in the KOM2 mice that adopt properties 
of the GCI-c-Syn strain. 

Both a-Syn neuronal inclusions and GCIs are present in the brains 
of patients with MSA, although neuronal inclusions are uncommon. 
We hypothesized that if the oligodendrocyte environment generates 
the GCI-a-Syn strain, neuronal inclusions in brains from paients 
with MSA would resemble Lewy bodies in brains from patients with 
Parkinson's disease. To test this, adjacent sections from six cases of 
MSA with neuronal inclusions in the medulla or hippocampus were 
immunostained with Syn7015 and Syn303. Syn7015 preferentially 
stained GCIs over neuronal inclusions (Fig. 3d, e), demonstrating 
that MSA neuronal inclusions are similar to Parkinson's disease Lewy 
bodies. Furthermore, Syn7015 also failed to detect neuronal inclusions 
in other brain regions—such as the substantia nigra and cortex—in 
additional cases of MSA (Extended Data Fig. 9d), which also provides 
support for our hypothesis. 

To test our hypothesis further, we injected human a-Syn PFFs into 
the pons and cerebellum of KOM2 mice (Fig. 4a, b). Then, the induced 
pathological «-Syn (PFF-KOM2-Syn) was recovered by sequential 
extraction. Notably, PFF-KOM2-Syn was much more potent than the 
PFFs themselves in inducing a-Syn pathology (Fig. 4c and Extended 
Data Fig. 10a). By contrast, mixing PFFs with the sarkosyl-insolu- 
ble fraction from uninjected KOM2 mice only slightly increased the 
potency (Fig. 4c). A similar phenomenon was observed when passaging 
LB-a-Syn in KOM2 mice (Fig. 4d). To test whether this effect of the cel- 
lular milieu is unique to oligodendrocytes, PFFs were added to different 
cell types including oligodendrocytes, hippocampal and cortical neu- 
rons, and QBI-WT-Syn cells. After the induction of a-Syn pathology, 
sarkosyl-insoluble «-Syn was prepared from these cells (Fig. 4e). PFFs 
passaged through oligodendrocytes were more potent than PFFs pas- 
saged through other cell types (Fig. 4fand Extended Data Fig. 10b-d). 
Taken together, these results clearly demonstrate that the oligoden- 
drocyte environment leads to the generation of the GCI-a-Syn strain. 

To test whether the generation of the GCI-a-Syn strain depends on 
cell structures or specific ‘factors, we prepared cell lysates from primary 
oligodendrocytes and neurons in which cell structures were disrupted 
but cell factors were preserved. We incubated each lysate with an equal 
amount of «-Syn monomer to generate «-Syn PFFs (Fig. 4g). PFFs gen- 
erated in oligodendrocyte lysates were able to induce several-fold more 
pathology compared with PFFs generated in neuron lysates or with 
a-Syn monomers alone, supporting the hypothesis that the generation 
of the GCI-a-Syn strain relies on specific factors in oligodendrocytes 
(Fig. 4h). 

Lastly, we asked whether the neuronal environment could convert 
the GCI-a-Syn strain to the LB-a-Syn strain. Primary mouse neu- 
rons were treated with GCI-a-Syn, and the induced a-Syn pathology 
was enriched by sequential extraction (pathological a-Syn labelled 
GCI-N-P1, GCI-N-P2 and GCI-N-P3 over three rounds of passag- 
ing) (Fig. 4i). Using enzyme-linked immunosorbent assays (ELISAs) 
that detect only human a-Syn or both human and mouse a-Syn, we 
estimated that majority (>99.7%) of pathological a-Syn in GCI-N-P1 
was derived through recruitment of mouse a-Syn (Supplementary 
Table 3). GCI-N-P1 was as potent as GCI-«-Syn (Fig. 4j). Furthermore, 
the GCI-c-Syn strain was repetitively passaged in neurons such that 
even after three rounds of passaging GCI-N-P3 still maintained the 
marked activity of the GCI-a-Syn strain (Fig. 4j and Extended Data 
Fig. 10e—g). Proteinase K digestion revealed that after passaging in 
neurons, the GCI-a-Syn conformation was maintained (Fig. 4k). 
Furthermore, when both GCI-a-Syn and LB-a-Syn were passaged 
in M83 mice expressing mutant human (A53T) a-Syn”’, their differ- 
ent seeding properties were maintained (Fig. 41). Thus, we conclude 
that the GCI-a-Syn strain can be maintained in a neuronal milieu. 
Combined with the observation that neuronal inclusions in MSA are 
similar to LB-a-Syn, our data suggest that GCI-a-Syn rarely transmits 
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from oligodendrocytes to neurons in MSA brains, although we cannot 
exclude the possibility that GCI-a-Syn transmits to neurons, but that 
these neurons died. 

Because LB-a-Syn could induce a-Syn pathology in oligodendro- 
cytes expressing a-Syn in vitro and in vivo, the lack of GCIs in Lewy 
body diseases is unlikely to be due to the strain properties of LB-a-Syn, 
but might be attributable to the lack of a-Syn in oligodendrocytes”. 
The source of «-Syn in oligodendrocytes in MSA is still unclear. Two 
hypotheses have previously been proposed: (1) oligodendrocytes patho- 
logically overexpress a-Syn in MSA™ and (2) oligodendrocytes take up 
a-Syn from neurons”’. GCI-a-Syn also could not induce a-Syn pathol- 
ogy in oligodendrocytes cultured in medium with high concentration 
of «-Syn monomer (data not shown), suggesting that internalization 
of exogenous a-Syn from the environment might be insufficient for 
GCI formation in oligodendrocytes. 
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Any Methods, including any statements of data availability and Nature Research 
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METHODS 


No statistical methods were used to predetermine sample size. The investigators 
were blinded to allocation during all quantification experiments involving manual 
counting. 

Recombinant o-Syn purification and in vitro fibrillization. Full-length human 
and mouse a-Syn (1-140) proteins were expressed in BL21 (DE3) RIL cells and 
purified as previously described”. Fibrillization was conducted by diluting recom- 
binant «-syn to 5 mg/ml in sterile Dulbecco's PBS (Cellgro, Mediatech; pH adjusted 
to 7.0) followed by incubating this recombinant «-Syn at 37 °C with constant agi- 
tation at 1,000 r.p.m. for 7 days. Successful «-Syn fibrillization was verified by 
sedimentation test and thioflavin T-binding assay as described?” 

Preparation of sarkosyl-insoluble fractions from disease and control brains. All 
human brain tissues are from the CNDR brain bank”. Preparations of the sarko- 
syl-insoluble fraction were performed as previously described? and outlined in 
Extended Data Fig. 1a. In brief, brain regions with abundant a-Syn inclusions from 
patients with MSA, PDD, DLB or AD were identified by post mortem histological 
examination». The brains of patients with Alzheimer’s disease were selected for 
the presence of abundant Lewy bodies in addition to plaques and tangles. Frozen 
brain tissues from the identified regions were homogenized in high-salt (HS) buffer 
(50mM Tris-HCl pH 7.4, 750 mM NaCl, 10 mM Nak, 5mM EDTA) with protease 
and protein phosphatase inhibitors, incubated on ice for 20 min and centrifuged at 
100,000 g for 30 min. The pellets were then re-extracted with HS buffer, followed by 
sequential extractions with 1% Triton X-100-containing HS buffer and 1% Triton 
X-100-containing HS buffer with 30% sucrose. The pellets were then re-suspended 
and homogenized in 1% sarkosyl-containing HS buffer, rotated at 4°C overnight 
and centrifuged at 100,000 g for 30 min. The resulting sarkosyl-insoluble pellets 
were washed once with Dulbecco's PBS and re-suspended in Dulbecco's PBS by 
brief sonication (QSonica Microson XL-2000; 20 pulses; setting 2; 0.5s per pulse). 
This suspension was termed the ‘sarkosyl-insoluble fraction, which contained 
pathological «-Syn and was used for the cellular and in vivo assays described in 
this Letter. The amount of a-Syn, tau, AB 40 and A 42 in the sarkosyl-insoluble 
fractions was determined by sandwich ELISAs (see ‘Sandwich ELISA) and the 
protein concentrations were examined by bicinchoninic acid assay. Proteinase K, 
trypsin and thermolysin digestion was performed as previously described’. 
Sandwich ELISA. The concentrations of tau, A3 1-40 and AG 1-42 in the sarko- 
syl-insoluble fraction of human brain extractions were measured using sandwich 
ELISA as previously described” *°, with the following combinations of capture and 
reporting antibodies: Tau5/BT2*HT7 for tau, Ban50/BA27 for AS 1-40, Ban50/ 
BC05 for AB 1-42. 

To measure the concentration of a-Syn, 384-well Nunc Maxisorp clear plates 

were coated with 100 ng (30 il per well) Syn9027, a MAb to «-Syn, in Takeda buffer 
and incubated overnight at 4°C. The plates were washed 4 with PBS containing 
1% Tween 20 (PBS-T), and blocked using Block Ace solution (9011 per well) (AbD 
Serotec) overnight at 4°C. Then, the plates were incubated with brain lysates at 
4°C overnight using recombinant a-Syn monomer as standards. The plates were 
then washed with PBS-T and a rabbit monoclonal anti-«-Syn antibody, MJF-R1 
(1:1,000, 30:1 per well) was added to each well and incubated at 4°C overnight. 
After washing, goat-anti-rabbit-IgG conjugated to horse radish peroxidase (Cell 
Signaling Technology, 1:15,000, 30 11 per well) was added to the plates followed by 
incubation for 2h at room temperature. Following another wash, the plates were 
developed for 10-15 min using 1-Step Ultra TMB-ELISA substrate solution (Fisher 
Scientific, 3011 per well), the reaction was quenched using 10% phosphoric acid 
(30 1] per well) and plates were read at 450 nm on a Molecular Devices Spectramax 
M5 plate reader. 
Cell cultures. Primary mouse neurons were prepared from the hippocampus of 
embryonic day (E) 15-E17 CD1 mouse embryos as previously described’’. PFFs 
and sarkosyl-insoluble «-Syn fractions were diluted in Dulbecco's PBS (without 
Mg? or Ca?*) and sonicated (QSonica Microson XL-2000; 60 pulses; setting 1.5; 
0.5s per pulse). Neurons were then treated with PBS, sonicated PFFs or the a-Syn 
sarkosyl-insoluble fractions at 10 days in vitro (DIV) and collected for immunocy- 
tochemistry at 14 days post-treatment. To passage GCIs in primary neurons, five 
million neurons were plated per 10cm dish, treated with 30 ng of sonicated GCI- 
a-Syn at DIV 10 and collected at DIV 24 by sequential extraction with the same 
protocol as described for human brain, except that HS buffer (50 mM Tris-HCl 
pH 7.4, 750mM NaCl, 10mM NaF, 5mM EDTA) containing 1% Triton-100 was 
used in the initial extraction. The amount of total a-Syn in the sarkosyl-insoluble 
fractions was determined by sandwich ELISA with antibodies against both mouse 
and human «-Syn (Syn9027 and HuA) and the amount of human «-Syn was exam- 
ined by sandwich ELISA using MJF-R1, which recognizes only human a-Syn, as 
the reporter antibody. Proteinase K digestion was performed as above described 
for GCI-a-Syn and LB-«-Syn. Treatment of primary neurons with chloroquine 
was performed as previously described’. 

Primary oligodendrocytes were prepared from the cortex of neonatal Sprague 
Dawley rats (Charles River Laboratories) as previously described*!. In brief, 
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oligodendrocyte progenitor cells were purified from mixed glial culture by 
mechanical shaking. The purified oligodendrocyte progenitor cells were plated 
on poly-L-lysine-coated coverslips and infected with AAV8-a-Syn-mCherry or 
AAV8-a-Syn three days after plating. Then, differentiation was induced three days 
after infection by culturing oligodendrocyte progenitor cells in the differentiation 
medium. Treatment with sonicated a-Syn PFF and sarkosyl-insoluble fractions 
was performed three days after differentiation as described above. Treated oligo- 
dendrocytes were collected for immunocytochemistry at 14 days post-treatment. 
To evaluate the purity of the oligodendrocyte culture, three independent cultures 
were stained with various cell markers for astrocytes (glial fibrillary acidic protein, 
GFAP), microglial (Ibal), neuron (NeuN) and oligodendrocytes (Olig2) at DIV 
3, DIV 9 and DIV 23. At least three different 20x images were randomly taken 
for each coverslip and at least three coverslips were analysed for each time point 
for each cell marker. 

To passage PFF in primary oligodendrocytes, two million oligodendrocyte 
progenitor cells were plated in a 10-cm dish, infected with AAV8-Syn to express 
human a-Syn, differentiated and treated with sonicated a-Syn PFFs as described 
above. Cells were collected at 14 days after treatment by sequential extraction with 
the same protocol as described for primary mouse neurons. The amount of total 
a-Syn in the sarkosyl-insoluble fractions was determined by sandwich ELISA with 
three different combinations of capture and reporting antibodies (Syn9027 + HuA, 
Syn9027 + MJF-R1 and HuA + Syn211). 

The culture of QBI-WT-Syn cells and treatment with misfolded a-Syn were 
performed as previously described!®, To induce a-Syn pathology in QBI-WT-Syn 
cells, one million QBI-WT-Syn cells were plated in a 6-cm dish and treated with 
PFFs two days later. Cells were collected at three days post-treatment by sequential 
extraction with the same protocol as described for primary mouse neurons. The 
amount of total «-Syn in the sarkosyl-insoluble fractions was determined by sand- 
wich ELISA with three different combinations of capture and reporting antibodies 
(Syn9027 + HuA, Syn9027 + MJF-R1 and HuA + Syn211). 

Primary hippocampus and cortical neurons of Sprague Dawley rats were gen- 

erated from the Neuron Culture Service Center at University of Pennsylvania. To 
passage PFFs in primary rat neurons, 1.2 million hippocampus or cortical neurons 
were plated in a 6-cm dish, infected with AAV8-ca-Syn at DIV 3 to express human 
a-Syn and treated with PFFs at DIV 6 as described above. Cells were collected at 14 
days post-treatment by sequential extraction with the same protocol as described 
for primary mouse neurons. The amount of total «-Syn in the sarkosyl-insoluble 
fractions was determined by sandwich ELISA with three different combinations 
of capture and reporting antibodies (Syn9027 + HuA, Syn9027 + MJF-R1 and 
HuA + Syn211). 
Stereotaxic injection of sarkosyl-insoluble fraction of pathological a-Syn and 
a-Syn PFFs. Sarkosyl-insoluble fractions from diseased brains were diluted in 
sterile Dulbecco’ PBS to reach the same concentration of «-Syn in the samples and 
sonicated as described above. Two-to-three-month-old C57BL6/C3H wild-type 
mice or 3-4-month-old KOM2 mice were anaesthetized with ketamine hydro- 
chloride (100 mg/kg), xylazine (10 mg/kg) and acepromazine (0.1 mg/kg). For 
wild-type mice, 50 ng of sarkosyl-insoluble pathological «-Syn from two different 
MSA brains and one brain of a patient with PDD (‘PDD brair’) or 6.25 }1g mouse 
PFFs in 2.511 Dulbecco’ PBS was stereotaxically injected into the dorsal striatum 
(coordinates: 0.2 mm relative to bregma, +2.0mm from midline, +3.2 mm beneath 
the surface of skull) with 10-11 syringes (Hamilton) at a rate of 0.411 per min. For 
KOM2 mice, 18.75 ng of pathological «-Syn from three different MSA brains, two 
different PDD brains and one brain of a patient with DLB (‘DLB brair) in 2.511 
Dulbecco's PBS was stereotaxically injected into the thalamus, an area of relatively 
high oligodendrocyte «-Syn expression in these mice (coordinates: —2.5mm rel- 
ative to bregma, +2.0mm from midline, +3.4mm beneath the surface of skull) 
at a rate of 0.41 per min. Animals were then euthanized at 1, 3 and 6 month 
post-injection. 

To passage PFFs in KOM2 mice, 5 mg/ml PFFs were sonicated as described 
above and stereotaxically injected into the pons and cerebellum of the KOM2 
mice (coordinates: —5.45 mm relative to bregma, +1.1 mm from midline, +5 mm 
beneath the surface of skull for pons and +2.6 mm for cerebellum). PFFs were 
injected into pons (1 il of PFFs) and cerebellum (1.5 11) at a rate of 0.111 per min. 
Mice were euthanized at 3-8 months post-injection and the pons and cerebellum 
were either processed for histological studies or frozen for sequential extraction. 
Sequential extractions were performed with the same protocol as described for 
human brain except that the two rounds of extraction of HS buffer were omitted 
and the extraction began with 1% Triton, containing HS buffer. The amount of 
pathological a-Syn in the sarkosyl-insoluble fraction was examined by sandwich 
ELISAs with two different combinations of capture and reporting antibodies 
(9027 + MJF-R1 and Syn211 + HuA). To passage LB-c-Syn in KOM2, 2.5411 of 
LB-a-Syn at the concentration of 7.52—-15.87 ng/1l was injected into the pons and 
cerebellum or the thalamus of KOM2 mice. Mice were euthanized at 3-8 months 
post-injection and the brains were sequentially extracted and analysed in the same 
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way as PFF-injected KOM2 mice. To passage LB-a-Syn and GCI-a-Syn in M83 
mice, 2.5 11 of GCI-a-Syn or LB-c-Syn at the concentration of 7.52 ng/1l was 
injected bilaterally into the striatum of M83 mice. Mice were euthanized at one 
month post-injection and the brains were sequentially extracted and analysed in 
the same way as PFF-injected KOM2 mice. 

Immunohistochemistry. For histological studies, animals were transcardially 
perfused with PBS and the brain and spinal cord were removed and fixed in 70% 
ethanol (in 150mM NaCl, pH 7.4) overnight before being processed for paraffin 
embedding. Immunohistochemistry was performed on 6-\1m thick sections as 
previously described** **. The names and dilutions of the primary antibodies used 
here are shown in Supplementary Table 4. Stained sections were digitized using 
a Perkin Elmer Lamina scanner at 20 magnification. For quantitative analysis 
of a-Syn pathology in the mouse brain, Syn506-stained sections spanning the 
entire mouse brain at ~120-,1m intervals were counted manually for the total num- 
ber of cells with a-Syn positive inclusions. The semiquantitative heat maps were 
generated as previously described* using Syn506-stained slides. To quantify the 
spreading of pathological «-Syn in wild-type mice, representative Syn506-stained 
brain sections at bregma 4.28, 2.10, 0.98, —0.22, —1.22, —2.18, —2.92, —3.52 and 
—4.48 mm were counted manually for the number of cells with «-Syn pathology 
in each brain region. For GCI-WT and PFF-WT mice, 1-3 brain sections at each 
bregma point have been quantified and the mean value of these 1-3 brain sections 
has been used for quantification. For LB-WT mice, because of the low amount of 
pathology, 3-13 brain sections at each bregma point have been quantified. 

For the quantification of neuronal inclusions in cases of MSA, adjacent sections 
of medulla or hippocampus from six cases of MSA were stained with Syn7015 and 
Syn303. The stained sections were digitized and the number of neuronal inclusions 
labelled by each antibody was quantified manually for each inferior olivary nucleus 
in medulla and dentate gyrus in hippocampus. To quantify the number of stained 
GCIs, three to six 10x images were randomly sampled in the white matter region 
near each inferior olivary nucleus or dentate gyrus. The positions of these 10x 
images were digitally labelled by drawing lines on the boundaries, and the same 
10x images were sampled on adjacent sections by copying these annotations using 
the digitized image, and the total number of GCIs labelled by Syn7015 or Syn303 
in these sampled areas were counted manually. 

To grade the a-Syn pathology revealed by Syn7015 or Syn303 immunohisto- 
chemistry in MSA and LB-spectrum «-synucleinopathies, adjacent sections of 
hippocampus, frontal cortex, substantia nigra, cerebellum or midbrain from three 
cases of AD, three cases of DLB, three cases of PDD, four cases of MSA-C and 
three cases of MSA-P (see Supplementary Table 1) were stained with Syn7015 
and Syn303 and graded by experienced pathologists. To quantify the ratio of total 
amount of Syn7015-positive versus Syn303-positive GCI or Lewy body, the stained 
adjacent sections were digitized and the total area occupied by Syn7015-positive 
or Syn303-positive a-Syn inclusions was quantified using an automated thresh- 
old-based algorithm by HALO software (http://www.indicalab.com/halo/). Then, 
the ratios of total Syn7015-positive area versus Syn303-positive area were calcu- 
lated. For the serial dilution of Syn7015 and Syn303, all the ratios were calculated 
against the area occupied by immunohistochemistry conducted with the highest 
concentration of Syn303 (45 ng ml). 

Animals. Two-to-three-month-old C57BL6/C3H wild-type mice were pur- 
chased from the Jackson Laboratories (Bar Harbour). M2 mice expressing wild- 
type human a-Syn in oligodendrocytes under the control of CNP promoter have 
previously been described”. KOM2 mice were generated by crossing M2 mice 
with Snca~/~ mice*. Three-to-four-month-old KOM2 mice have been used for 
the study. All breeding, housing, and experimental procedures were performed 
according to the NIH Guide for the Care and Use of Experimental Animals and 
approved by the University of Pennsylvania Institutional Animal Care and Use 
Committee (IACUC). Both male and female mice were used for this study. Mice 
were randomly assigned to each experimental group. Roughly equal numbers of 
male and female KOM2 mice have been used in each group. All the GCI-WT and 
LB-WT mice were female. All the three-months-post-injection PFF-WT mice were 
female. For the six-months-post-injection PFF-WT mice, three of them were male 
and one was female. 

Immunocytochemistry and quantification. For regular immunocytochemistry, 
cells were fixed in 4% paraformaldehyde (PFA) for 15 min followed by permea- 
bilization with 0.1% Triton X-100 for 15 min. To examine a-Syn aggregates, cells 
were fixed with 4% PFA containing 1% Triton X-100 for 15 min to remove soluble 
proteins. Fixed coverslips were blocked with 3% BSA and 3% FBS for 1h at room 
temperature and incubated with specific primary antibodies (Supplementary 
Table 4) at 4°C overnight followed by staining with secondary antibodies for 2h at 
room temperature. After mounting with Fluoromount G with DAPI (eBioscience), 
coverslips were scanned on a Perkin Elmer Lamina scanner. The total amount of 
81A signal, the total amount of MAP2 signals for neuronal cultures and the total 
number of DAPI-positive nuclei for oligodendrocytes and QBI-WT-Syn cells were 


quantified using Indica Labs HALO software. For cells cultured in 96-well plates, 
cells were incubated in DAPI solution after staining with secondary antibodies. 
Then, plates were scanned with In Cell Analyzer 2200 (GE Healthcare) and ana- 
lysed using the accompanying software (In Cell Toolbox Analyzer). 
Purification and depletion of a-Syn from the sarkosyl-insoluble fraction 
by immunoprecipitation. Control mouse IgG (Sigma) or Syn9027 MAb—an 
in-house-generated Mab against «-Syn (epitope aal130-140)—were coupled to 
tosyl-activated Dynabeads (Invitrogen) or NHS-activated magnetic beads (Thermo 
Scientific) following the manufacturer’s instructions. For immunoprecipitation 
purification, sarkosyl-insoluble fractions from diseased brains were incubated 
with control IgG-coupled beads in Dulbecco’ PBS and rotated at 4°C overnight. 
The resulting supernatant was then incubated with Syn9027-coupled beads in a 
rotator at 4°C overnight to capture «-Syn. The following day, the Syn9027 beads 
were washed 3 times with Dulbecco's PBS and incubated with 0.1 M ethanolamine 
(pH 11.5) for 3-7 min at 55°C to elute the bound a-Syn, which was then neu- 
tralized immediately with 1 M Tris (pH 7.0) and the eluted samples were stored 
at —80°C until use. For immunoprecipitation depletion, the sarkosyl-insoluble 
fractions from diseased brains were incubated with Syn9027-coupled beads at 4°C 
overnight. The resulting supernatants were incubated with Syn9027 beads again 
for a second round of «-Syn depletion and the final supernatants were stored at 
—80°C until use. 

Evaluatuing o-Syn expression in KOM2 mice. Total proteins were extracted from 
KOM2 and wild-type mice by sonicating the brain in 1% Triton X-100-containing 
HS buffer with phosphatase and protease inhibitors, and centrifuging at 100,000g¢ 
for 30 min at 4°C. Protein concentration of the supernatant was determined by 
BCA assay and same amount of protein from each mouse was resolved on 12% 
Bis-Tris gel and immunoblotted with antibodies against total a-Syn, mouse a-Syn 
or 3-tubulins. 

Generation of PFFs in different cell lysates. Neuron and oligodendrocyte cell 
lysates were prepared by sonicating primary rat hippocampus or cortical neurons at 
DIV 12 and oligodendrocyte cultures at DIV 10-12 in Dulbecco’ PBS. The protein 
concentrations of cell lysates were evaluated by BCA assay and adjust to 1.86 mg/ 
ml. a-Syn monomer was added to these cell lysates at the final concentration of 
500 1g/ml and shaken at 37 °C with constant agitation at 1,000 r.p.m. for 14 days. 
Statistical analysis. Unless specified otherwise, a two-tailed unpaired Student's 
t-test was used for all the comparisons in the study, and differences with P val- 
ues of less than 0.05 were considered significant. For each t-test, an F test was 
also performed to evaluate the differences in variances. If there was a significant 
difference in variances (P< 0.05 by F test), Welch’s correction on t-test was per- 
formed. Multiple comparisons were adjusted with Bonferroni correction. Detailed 
information regarding statistical analyses is provided in Supplementary Table 5. 
Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. All data generated or analysed during this study are included in 
the Letter and its Supplementary Information. 


26. Giasson, B. |., Murray, |. V., Trojanowski, J. Q. & Lee, V. M. A hydrophobic stretch 
of 12 amino acid residues in the middle of a-synuclein is essential for filament 
assembly. J. Biol. Chem. 276, 2380-2386 (2001). 

27. Volpicelli-Daley, L.A., Luk, K. C. & Lee, V. M. Addition of exogenous a-synuclein 
preformed fibrils to primary neuronal cultures to seed recruitment of 
endogenous a-synuclein to Lewy body and Lewy neurite-like aggregates. Nat. 
Protocols 9, 2135-2146 (2014). 

28. Toledo, J. B. et al. A platform for discovery: the University of Pennsylvania 
Integrated Neurodegenerative Disease Biobank. Alzheimers Dement. 10, 
477-484.e1 (2014). 

29. Guo, J. L. & Lee, V. M. Seeding of normal Tau by pathological Tau conformers 
drives pathogenesis of Alzheimer-like tangles. J. Biol. Chem. 286, 15317-15331 
(2011). 

30. Lee, E. B., Skovronsky, D. M., Abtahian, F., Doms, R. W. & Lee, V. M. Secretion and 

intracellular generation of truncated A@ in 8-site amyloid-3 precursor 

protein-cleaving enzyme expressing human neurons. J. Biol. Chem. 278, 

4458-4466 (2003). 

31. Richter-Landsberg, C. & Vollgraf, U. Mode of cell injury and death after 

hydrogen peroxide exposure in cultured oligodendroglia cells. Exp. Cell Res. 

244, 218-229 (1998). 

32. Duda, J. E. et al. Immunohistochemical and biochemical studies demonstrate a 

distinct profile of a-Synuclein permutations in Multiple System Atrophy. J. 

Neuropathol. 59, 830-841 (2000). 

33. Luk, K. C. et al. Intracerebral inoculation of pathological a-synuclein initiates a 

rapidly progressive neurodegenerative a-syncleinopathy in mice. J. Exp. Med. 

209, 975-986 (2012). 

34. Iba, M. et al. Synthetic tau fibrils mediate transmission of neurofibrillary tangles 
in a transgenic mouse model of Alzheimer’s-like tauopathy. J. Neurosci. 33, 
1024-1037 (2013). 

35. Abeliovich, A. et al. Mice lacking a-synuclein display functional deficits in the 
nigrostriatal dopamine system. Neuron 25, 239-252 (2000). 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


a Homogenize brain tissue in 4 volumes of High Salt buffer 
Spin at 100,000g for 30 min 


Sup Pellet 


Homogenize in 4 volumes of High Salt buffer 
Spin at 100,000g for 30 min 


Sup 


Pellet 


Homogenize in 9 volumes of High Salt buffer+1% Triton 


Pellet 


Spin at 100,000g for 30 min 


Sup 


Homogenize in 9 volumes of 
High Salt buffer+1% Triton+30% Sucrose 


L Spin at 100,000g for 30 min 


Sup 


Pellet 


Homogenize in 9 volumes of 
High Salt buffer+1% Sarkosyl 


Sup 


Spin at 100,000g for 30 min 


Pellet 
Resuspend in 9 volumes of PBS 
Spin at 100,000g for 30 min 


Sup Pellet (Sarkosyl insoluble) 
b Poiana Mg 0.2 eur of PBS 
‘\ 
2 
15k Da — = == = “|=- = = == 
(Undigested) SNL4 NAC1 Syn204 LB509 Syn211 Syn102 
: Antibody HuA SNL4 NAC1 Syn204 LB509 Syn211 Syn102 Band 
Epitope 1-140 2-11 75-91 87-110 115-122 121-125 131-140 Intensity 
Band 1 Ba b + a1 + + GCI>>LB 
Band 2 + - + of + +/- GCI<<LB 
Band 3 + + + + + - GCI<LB 
Band 4 + - Ba + + - GCI<LB 
d Thermolysin e 
SD Vo 2 Thermolysin/Protein_0 125. 25 5 
VOTES SE 6 Se OS Se 
seca VOW OVO YO 
a Whee ~ 
~ cosy rj a 
15k Dayeue ™ ee.” 
15k Da a -— ee Om i © [undigested 
f 1 =, g Trypsin/Protein __0 125° 25 5 
KN SD ot oo & 10° 
PEPYS (10°) 2 
15k Da ss Sette > = 


15k Daum Gwe ame GD cee emp [undigested 


Extended Data Fig. 1 | Biochemical analysis of GCI-a-Syn and 
LB-a-Syn. a, Schematic for sequential extraction of brains with 
a-synucleinopathy. Diseased brain samples were sequentially extracted 
with buffer of increasing extraction strengths (1% Triton X-100 followed 
by 1% sarkosyl) to remove soluble proteins. b, Proteinase K-digested 
GCI-a-Syn and LB-a-Syn were immunoblotted with a series of antibodies 
targeting specific domains of «-Syn that spanning the entire molecule. 

c, Summary of the results for experiments described in b. d, Thermolysin- 
digested and undigested sarkosyl-insoluble fractions from three cases of 
Lewy body disease (LB1-LB3) and three cases of MSA (GCI1-GCI3) 
were resolved on 12% Bis-Tris gel and immunoblotted with an antibody 
against a-Syn (Syn211). e, Sarkosyl-insoluble fractions from a pair of 
cases (one of Lewy body disease and one of MSA) were incubated with 
increasing concentrations of thermolysin (with a ratio of thermolysin to 


total protein that ranged from 1.25 x 10~* to 5 x 10~*) and immunoblotted 
with antibody against a-Syn (Syn211). Undigested fractions were loaded 
on the same gel. f, Trypsin-digested and undigested sarkosyl-insoluble 
fractions from three cases of Lewy body disease (LB1-LB3) and three 
cases of MSA (GCI1-GCI3) were resolved on 12% Bis-Tris gel and 
immunoblotted with an antibody against a-Syn (Syn211). g, Sarkosyl- 
insoluble fractions from a pair of cases (one of Lewy body disease and one 
of MSA) were incubated with increasing concentrations of trypsin (with 
the ratio of trypsin versus total protein range from 1.25 x 10°* to 5 x 10°”) 
and immunoblotted with an antibody against a-Syn (Syn211). Undigested 
fractions were loaded on the same gel. The experiments shown in b and 
d-g have been repeated three times with similar results. For gel source 
data, see Supplementary Fig. 1. 
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Extended Data Fig. 2 | Syn7015 preferentially recognizes GCIs over 
Lewy bodies. a, Immunohistochemistry using a series dilution of 
Syn303 or Syn7015 on serial sections of a DLB brain and a MSA brain. 
At 45ng ml’, Syn7015 recognized both Lewy bodies and GCIs. At lower 
concentrations—particularly 1.67 ng ml“! and 0.56 ng ml~'—Syn7015 
preferentially recognizes GCIs over Lewy bodies (repeated with four 
cases). b, Quantification of the area occupied by pathological a-Syn 
stained with serial dilutions of Syn7015 or Syn303 on serial sections of 


two cases of MSA-P, two cases of MSA-C, one case of AD, two cases of 
PDD and two cases of DLB. The results for each case are normalized to 
Syn303 staining at 45 ng ml~! (GCI, n=4 cases; LB, n=5 cases). c, a-Syn 
pathology revealed by Syn303 or Syn7015 in adjacent sections from 

cases of Lewy body disease and MSA (repeated with seven cases). Results 
shown as mean + s.e.m. *P < 0.05. Scale bar: 50j1m (a), 100j1m (c), 25 um 
(c inset). See Supplementary Table 5 for statistical details.Source Data 
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Extended Data Fig. 3 | GCI-a-Syn is more potent in inducing a-Syn 
pathology in primary oligodendrocytes. a, Oligodendrocytes treated 
with the same amount of GCI-a-Syn, LB-a-Syn or PFF were sequentially 
extracted with 1% Triton-X100 lysis buffer followed by 1% sarkosy] lysis 
buffer, which were combined together as the sarkosyl-soluble fraction. 
The sarkosyl-insoluble pellets were resuspended in Dulbecco's PBS by 
sonication. Both soluble and insoluble fractions were immunoblotted with 
an antibody against total or p$129 a-Syn. b, Densitometric quantification 
of insoluble versus soluble a-Syn for experiments described in a (n= 3 
biological replicates). c, d, Primary oligodendrocyte cultures were 
immunostained with antibodies against various cell-type specific markers: 
CNP (oligodendrocytes), olig2 (oligodendrocytes), Ibal (microglial 

cells), NeuN (neurons), GFAP (astrocytes), PLP (oligodendrocytes) 

at day in vitro 3 (DIV 3) (c) or DIV 9 (d). e, Insoluble phosphorylated 
a-Syn, induced in primary oligodendrocytes overexpressing «-Syn, was 
co-stained with antibodies against various cell-type specific markers, 
demonstrating that the cells with a-Syn pathology are oligodendrocytes. 
f, Percentage of different type of cells (oligodendrocytes, microglial cells 
and astrocytes) in oligodendrocyte culture, at DIV 3 (the time point of 


GCl-a-Syn Strain 


LB-a-Syn Strain 


tes Oligodendrocyte 
* Enviroment 


* Neuron 
* °° Enviroment 


virus infection), DIV 9 (the time point for misfolded a-Syn treatment) 
and DIV 23 (the time point for fixation) (n =3 (DIV3) or 5 (DIV 9, 

DIV 23) coverslips from three independent experiments). g, Working 
hypotheses regarding the different cell-type distributions of GCI-a-Syn 
and LB-a-Syn strains in diseased brains. Hypothesis 1 states that the 
unique properties of GCI-a-Syn and LB-a-Syn strains determine their 
different cell-type distributions. The GCI-a-Syn strain (represented 

by red spheres) is more efficient in inducing «-Syn pathology in 
oligodendrocytes, whereas the LB-«-Syn strain (green spheres) is more 
efficient in inducing «-Syn pathology in neurons. Hypothesis 2 states that 
GCI-a-Syn and LB-«-Syn strains do not have cell-type preferences and 
that they could both be initiated by the same misfolded «-Syn seeds (grey 
spheres), but that the different intracellular environments of neurons and 
oligodendrocytes convert these «-Syn seeds to different strains. Results 
shown as mean +s.e.m. **P < 0.01. Scale bars: 100 tm (c, d); 50m (e). 
The experiments shown in a and c-e have been repeated three times with 
similar results. See Supplementary Table 5 for statistical details. For gel 
source data, see Supplementary Fig. 1.Source Data 
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Extended Data Fig. 4 | See next page for caption. 
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Extended Data Fig. 4 | The seeding properties of GCI-a-Syn and 
LB-a-Syn do not show cell-type preference. a, Soluble and insoluble 
fractions from primary neurons treated with the same amount of 
GCI-«a-Syn, LB-a-Syn or PFF were immunoblotted with antibodies 
against total or pS129 a-Syn. b, Densitometric quantification of insoluble 
versus soluble a-Syn for experiments described in a (n = 3 biological 
replicates). c, Quantification of phosphorylated «-Syn in QBI-WT-Syn 
cells induced by an equal amount of GCI-a-Syn (MSA-C, MSA-P), 
LB-a-Syn (PDD, DLB and AD) or PFFs (GCI, n= 8 different preparations; 
LB, n=9 different preparations). d, Soluble and insoluble fractions 

from QBI-Syn-WT cells treated with the same amount of GCI-a-Syn, 
LB-c-Syn or PFF were immunoblotted with an antibody against total 

or pS129 a-Syn. e, Densitometric quantification of insoluble versus 
soluble a-Syn for experiments described in d (n = 3 biological replicates). 
f, Quantification of insoluble phosphorylated «-Syn in primary neurons 
induced by various concentrations of GCI-a-Syn and LB-a-Syn before or 
after immunoprecipitation purification (n = 3 independent experiments). 
g, Quantification of insoluble phosphorylated a-Syn in primary neurons 
incubated with (1) GCI-a-Syn and LB-a-Syn preparations; (2) the same 
preparations after immunoprecipitation depletion to remove a-Syn; and 
(3) the depleted preparation to which the same amount of a-Syn PFFs 

(1 ng) was added (n =3 independent experiments). h, PFFs combined with 
the GCI-a-Syn preparation depleted of a-Syn behave similarly to a-Syn 
PFFs alone. Quantification of insoluble phosphorylated a-Syn in primary 
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neurons seeded by PFFs alone or PFFs combined with depleted GCI 
preparation (the amount of pathology induced by immunoprecipitation- 
depleted GCI preparation alone has been subtracted) (n = 3 independent 
experiments). i, j, Primary neuron were treated with GCI-a-Syn, LB-a- 
Syn or PFF and incubated with chloroquine (Ch) at the day of misfolded 
a-Syn treatment (“DPT0’) or three days post-treatment (DPT3). The 
amount of insoluble phosphorylated «-Syn was quantified three days 
after chloroquine treatment (n = 3 (DPTO-GCI and PFF) or 4 (DPTO-LB, 
DPT3) independent experiments). k, Quantification of the number of 
cells with a-Syn pathology in wild-type mice inoculated with 50 ng of 
GCI-a-Syn or LB-a-Syn at six months post-injection. l, Representative 
photomicrographs of a-Syn pathology (stained by Syn506) in multiple 
brain regions ipsilateral to the injection site in GCI-a-Syn-, PFF- and 
LB-a-Syn-injected wild-type mice. Cortex, motor cortex; ENT, 
entorhinal cortex. Results shown as mean +s.e.m. *P < 0.05; **P< 0.01; 
*** P < (0001; ns, not significant. Statistics shown in c represent 
two-tailed, unpaired t-test using the mean value of each case. Statistics 
shown in f represent one-way Anova with Tukey’s multiple comparison 
test. Statistics shown in g, h are two-tailed, unpaired t-test adjusted 

with Bonferroni correction for multiple comparison. Statistics shown 

in i, j are two-way ANOVA, with Sidak’s multiple comparisons test. The 
experiments in a, d and I have been repeated three times with similar 
results. Scale bar: 100j1m. See Supplementary Table 5 for statistical details. 
For gel source data, see Supplementary Fig. 1.Source Data 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


b GCI-WT 


(3mpi) 
Hope an 
Kaa 


Bregma 4.28 mm 


=e) 
AN} 


Bregma -2.92 mm 


8B DS6Gse- 
883886e6- 


Extended Data Fig. 5 | Distribution of a-Syn pathology in injected 
wild-type mice. a, Representative photomicrographs of a-Syn pathologies 
stained by an antibody against pS129 a-Syn (814A) in multiple brain 
regions in GCI-a-Syn injected wild-type mice (experiment repeated three 
times). b, Heat map for the distribution of a-Syn pathology in wild-type 
mice injected with GCI-a-Syn, PFF or LB-a-Syn. GCI-a-Syn, LB-a-Syn 
and a-Syn PFF were unilaterally injected into the dorsal striatum of 


PFF-WT 
(6mpi) 


* Injection Site 
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wo 


wild-type mice. The seeded a-Syn pathology was analysed and graded 
by immunohistochemistry with Syn506. The data were presented as 
heat maps to semiquantitatively demonstrate the central nervous system 
(CNS) distribution of «-Syn pathology. Each panel represents a coronal 
plane (bregma 4.28, 2.10, 0.98, —0.22, —1.22,—2.18, —2.92, —3.52 and 
—4.48 mm) for each treatment group (GCI-WT, n=3 mice; PFF-WT, 
n=4 mice; LB-WT, n=3 mice). Scale bar: 100 1m. 
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Extended Data Fig. 6 | Characterization of KOM2 mice. a, Brain sections 
from KOM2 mice were double-labelled with antibodies against a-Syn 
(LB509) and various cell-type specific markers: Olig2 (oligodendrocytes), 
Ibal (microglial cells), GFAP (astrocytes) and NeuN (neurons). In KOM2 
mice, «-Syn is expressed only in oligodendrocytes. b, Brain lysates of 
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GFAP/LB509/DAPI NeuN/LB509/DAPI 


wild-type and KOM2 mice were immunoblotted with an antibody against 
total a-Syn (Syn 9027), mouse a-Syn (Cell Signalling) and 8-tubulin. 
Scale bars: 501m (a), 251m (a inset). The experiments in a, b have 

been repeated three times with similar results. For gel source data, see 
Supplementary Fig. 1. 
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Extended Data Fig. 7 | Induction of oligodendroglial «-Syn pathology misfolded «-Syn (Syn506) and various cell-type specific markers. The 
in KOM2 mice. a, Syn506-positive a-Syn aggregates seeded by equal induced a-Syn pathologies are located in oligodendrocytes in KOM2 
amounts of GCI-a-Syn or LB-a-Syn (18.75 ng) in KOM2 mice at one, mice. g, GCI-a-Syn-injected KOM2 mouse brain sections were stained 


three and six month post-injection in fimbria and thalamus. b-e, with an antibody against phosphorylated a-Syn (81A). Results shown as 
Quantification of the number of oligodendrocytes with «-Syn pathology mean +s.e.m. Scale bars: 501m (a, f and g), 12.5 1m (a insets) and 25 4m 
in the optic tract (b), cerebral peduncle (c), fimbria (d), and thalamus (f insets). The experiments in a, f and g have been repeated three times 
(e) at different time points (one month post-injection, n =3 mice; three with similar results. Statistics shown in b, c are two tailed unpaired 

and six months post-injection, n = 5 mice). f, Brain sections from GCI- t-test adjusted with Bonferroni correction. See Supplementary Table 5 
a-Syn injected KOM2 mice were double-labelled with antibodies against for statistical details.Source Data 
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Extended Data Fig. 8 | Distribution of «-Syn pathology in injected (n= 3), three months (n = 5) and six month (n =5) post injection. Each 
KOM2 mice. Heat maps to semiquantitatively demonstrate the CNS panel represents a coronal plane (bregma —1.22, —2.18, —2.92, —3.52, 
distribution of «-Syn pathology in KOM2 mice unilaterally injected —4.48 mm) for each treatment group. Because there is no a-Syn pathology 


with GCI-a-Syn or LB-«-Syn into the thalamus. a-Syn pathologies were in the contralateral side, only the ipsilateral side is shown. 
analysed and graded by immunohistochemistry with Syn506 at one month 
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Extended Data Fig. 9 | Oligodendrocyte environment generates the 
GCI-a-Syn strain. a, Immunohistochemistry of adjacent sections from 
human or mouse brains with Syn303 and Syn7015. First row, adjacent 
brain sections of cases of Lewy body disease and MSA used for the 
extraction of LB-a-Syn and GCI-a-Syn for injection. Second row, adjacent 
brain sections of KOM2 mice injected with LB-a-Syn prepared from the 
brain tissue shown in the first row. OPT and CP are shown. Whereas the 
LB-a-Syn used for the injections is Syn7015-negative, the oligodendrocyte 
pathology that is induced is Syn7015-positive. Third row, adjacent brain 
sections of KOM2 mice injected with GCI-a-Syn prepared from the 

brain sample shown in the first row. OPT and CP are shown. Fourth row, 
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adjacent brain sections of M83 mice with a-Syn pathology. Midbrain 
(MB) and pons are shown. b, Brain sections from KOM2 mice injected 
with GCI-c-Syn or LB-a-Syn in the thalamus were double-labelled with 
Syn506 and antibodies against P62 (left panel) or ubiquitin (right panel). 
c, Brain sections from GCI-a-Syn- or LB-«-Syn-injected KOM2 mice were 
double-labelled with Syn506 and GFAP. Both ipsilateral and contralateral 
optic tracts are shown. d, Adjacent sections of substantia nigra and cortex 
from two different cases of MSA were stained with Syn7015 and Syn303. 
Scale bars: 50 1m (a-c), 12.5 1m (a insets), 20 j1m (b inset) and 30j1m 

(d). The experiments in a—-d have been repeated three times with similar 
results. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


PFF-KOM2-Syn 200pg | | PFF 200pg 


a 
< 
fa 
=> 
<x 
= 
2 
i= 
> 
Q 
a 


PFF-Oligo-Syn 200pg PFF-HipN-Syn 200pg 


pSyn(81A)/DAPI 


c Sarkosyl soluble Sarkosyl insoluble d 0.12 
oO 
2 0.10 
a 
SK oSol 6 on Sof 8 < 0.08 
Cy . S Oo” 
SSF GS CSF & ® F 0.06 
ECE ER ECECE 3° 0.04 
BPE MWe VWVL BMPs WV Ve 3 : 
£ 0.02 


PFF 2ng 


PFF-CtxN-Syn 200pg 


LETTER 


PFF 20ng PFF + KOM2 200pg 


PFF-QBI-Syn 200pg PFF 200pg 


Riek kek kk kk kik 


SAAN ESL 
g 


KKK K 
K & & g 
37k ba {——————| | — —————|oarron 
e mm GCI 200pg GCI-N-P1 200pg GCI-N-P2 200pg GCI-N-P3 200pg 
< 
Qa 
x 
es) 
£ 
> 
Q 
a 
f Sarkosy! soluble Sarkosyl insoluble és 
re} 
SQ SX MS 2 
> OO OK SY YY YK 2 Q 
FEF EES FPP PEM 2 
3 
z 


Extended Data Fig. 10 | a-Syn pathology induced by passaged PFF and 
GCI. a, Insoluble phosphorylated «-Syn in QBI-WT-Syn cells seeded by 
PFFs, PFFs that have been passaged in KOM2 mice (PFF-KOM2-Syn) or 
PFFs that were combined with the sarkosyl-insoluble fraction prepared 
from uninjected KOM2 mice (PFF + KOM2). b, Insoluble phosphorylated 
a-Syn in QBI-WT-Syn cells induced by an equal amount (200 pg) of 
PFF-oligo-Syn, PFF-HipN-Syn, PFF-CtxN-Syn, PFF-QBI-Syn and PFFs. 
c, Soluble and insoluble fractions from QBI-Syn-WT cells treated with 
the same amount of PFF-oligo-Syn, PFF-HipN-Syn, PFF-CtxN-Syn, 
PFF-QBI-Syn and PFF were immunoblotted with antibodies against 

total or pS129 a-Syn. d, Densitometric quantification of insoluble versus 
soluble a-Syn for experiments described in c (n= 3 biological replicates). 
e, Insoluble phosphorylated a-Syn in QBI-WT-Syn cells induced by 


GCI-a-Syn and GCI-a-Syn that has been passaged in primary neurons for 
multiple times (that is, GCI-N-P1, GCI-N-P2, GCI-N-P3). f, Soluble and 
insoluble fractions from QBI-Syn-WT cells treated with the same amount 
of GCI, GCI-N-P1, GCI-N-P2, GCI-N-P3 and PFF were immunoblotted 
with antibodies against total a-Syn or pS129 a-Syn. g, Densitometric 
quantification of insoluble versus soluble «-Syn for experiments described 
in f (n= 3 biological replicates). Statistics shown in d, g are one-way 
ANOVA followed by Dunnett’s post hoc test comparing each group 

with PFF-oligo-Syn in d or GCI in g. Results shown as mean +s.e.m. 

**P < 0.01; ***P < 0.01; ns, not significant. Scale bars: 501m (a, b, e). The 
experiments in a—c and e-f have been repeated three times with similar 
results. See Supplementary Table 5 for statistical details. For gel source 
data, see Supplementary Fig. 1.Source Data 
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RSPO2 inhibition of RNF43 and ZNRF3 governs limb 
development independently of LGR4/5/6 


Emmanuelle Szenker-Ravi>!’”, Umut Altunoglu?”” 


, Marc Leushacke!”, Célia Bosso-Lefévre!?, Muznah Khatoo!, 


Hong Thi Tran*, Thomas Naert*, Rivka Noelanders*, Amin Hajamohideen!, Claire Beneteau, Sergio B. de Sousa®’, 
Birsen Karaman?, Xenia Latypova®, Seher Basaran”, Esra Borklii Yiicel®, Thong Teck Tan!, Lena Vlaeminck*"*, Shalini S. Nayak’, 
Anju Shukla’, Katta Mohan Girisha®, Cédric Le Caignec*!°, Natalia Soshnikova!!, Zehra Oya Uyguner?, Kris Vieminckx**, 


Nick Barker!!?5*, Htilya Kayserili2®* & Bruno Reversade 


The four R-spondin secreted ligands (RSPO1-RSPO4) act via 
their cognate LGR4, LGR5 and LGR6 receptors to amplify WNT 
signalling’ >. Here we report an allelic series of recessive RSPO2 
mutations in humans that cause tetra-amelia syndrome, which is 
characterized by lung aplasia and a total absence of the four limbs. 
Functional studies revealed impaired binding to the LGR4/5/6 
receptors and the RNF43 and ZNRF3 transmembrane ligases, and 
reduced WNT potentiation, which correlated with allele severity. 
Unexpectedly, however, the triple and ubiquitous knockout of Lgr4, 
Lgr5 and Lgr6 in mice did not recapitulate the known Rspo2 or Rspo3 
loss-of-function phenotypes. Moreover, endogenous depletion or 
addition of exogenous RSPO2 or RSPO3 in triple-knockout Lgr4/5/6 
cells could still affect WNT responsiveness. Instead, we found that 
the concurrent deletion of rnf43 and znrf3 in Xenopus embryos was 
sufficient to trigger the outgrowth of supernumerary limbs. Our 
results establish that RSPO2, without the LGR4/5/6 receptors, serves 
as a direct antagonistic ligand to RNF43 and ZNRF3, which together 
constitute a master switch that governs limb specification. These 
findings have direct implications for regenerative medicine and 
WNT-associated cancers. 

Limb development is governed by a three-dimensional signalling sys- 
tem that defines proximodistal, anteroposterior and dorsoventral axes’. 
Tetra-amelia with lung hypo/aplasia syndrome (TETAMS; MIM data- 
base entry 273395) is an extreme condition, in which fetuses lack all four 
limbs. TETAMS without lung hypoplasia has been linked to a WNT3 
nonsense mutation in humans°. The four R-spondin ligands (RSPO1- 
RSPO4) act as enhancers of WNT signalling**. They bind to their cog- 
nate receptors LGR4, LGR5 and LGR6 via their Furin-like 2 domain, 
and to the E3 ubiquitin ligases RNF43 and ZNRF3 via their Furin- 
like 1 domain!®. This tripartite interaction prevents WNT receptor 
degradation mediated by RNF43 or ZNRF3. Rspo2 mutation in mice 
leads to limb truncations reminiscent of tetra-amelia!!—'4, but a role 
for its receptors has not been substantiated, to our knowledge, during 
limb morphogenesis. 

Here, we describe five families with eleven affected individuals that 
display severe developmental limb defects. In family 1, four affected 
fetuses presented with radial ray deficiency with humeral involvement, 
absence of tibiae with or without femoral deficiency, and absence of digits 
on the preaxial side (Fig. 1a, b, Extended Data Fig. la and Extended 
Data Table 1). We propose to name this severe dysostosis humero- 
femoral hypoplasia with radio-tibial ray deficiency (HFH-RTRD). 
Exome sequencing identified a homozygous p.Arg69Cys mutation in 


1,3,8,14,153 


RSPO2 (Extended Data Fig. 1b) that affects a residue conserved in all 
R-spondin paralogues and homologues (Fig. 1c and Extended Data 
Fig. 2a). The analogous mutation p.Arg64Cys in RSPO4 was shown to 
cause congenital anonchia’». The seven affected fetuses from families 2 
to 5 presented with complete absence of four limbs, lung hypo/aplasia, 
cleft lip-palate, and labioscrotal fold aplasia, all characteristic of 
TETAMS (Fig. 1a, b, Extended Data Fig. 1a and Extended Data Table 1). 
A p.Gln70* nonsense mutation in RSPO2 was identified in family 2'¢ 
(Fig. 1a, c). In family 3!’, array comparative genomic hybridization 
(array-CGH) analysis identified a biallelic deletion of 154 kilobases (kb) 
spanning intron 5 and exon 6 of RSPO2 (Fig. la, c and Extended Data 
Fig. 1c). In family 4, exome sequencing revealed a recessive p.Glu137* 
nonsense mutation in RSPO2 (Fig. 1a, c). In family 5 with three 
consecutive TETAMS fetuses, a homozygous RSPO2 frameshift 
p.Gly42Val fs*49 mutation was identified (Fig. la—c). These results estab- 
lish a new aetiology for tetra-amelia and demonstrate the crucial involve- 
ment of RSPO2 in craniofacial, limb and lung development in humans. 

We selected the p.Arg69Cys (R69C) and p.Gln70* (Q70X) muta- 
tions that are responsible for HHH-RTRD and TETAMS, respectively, to 
assess whether mutant RSPO2 retained binding to its cognate receptors. 
By co-immunoprecipitation analysis, only wild-type RSPO2, but not 
the RSPO2(F105A/F109A) mutant that specifically abrogates binding 
to LGRs!8, nor the RSPO2(R69C) or RSPO2(Q70X) mutants, was able 
to be pulled down by LGRS (Fig. 1d and Extended Data Fig. 2b, c). 
Although the RSPO2(F105A/F109A) mutant could be readily co-im- 
munoprecipitated by RNF43 or ZNRF3, neither RSPO2(R69C) nor 
RSPO2(Q70X) could interact with RNF43 or ZNRF3 (Fig. le and 
Extended Data Fig. 2d). Similarly, only wild-type RSPO2, but not the 
RSPO2(R69C) or RSPO2(Q70X) mutants, could be retained on the 
surface of HEK293T cells overexpressing LGR5 or RNF43 (Extended 
Data Fig. 2e). Wild-type RSPO2, and to a lesser extent RSPO2(R69C) 
but not RSPO2(Q70X), could enhance WNT3A-mediated activation 
of SUPERTOPFLASH (STF) luciferase (Fig. 1f and Extended Data 
Fig. 2f). These in vitro results indicate that the R69C and Q70X muta- 
tions diminish the ability of RSPO2 to bind to LGRs, RNF43 or ZNRF3, 
and to amplify 8-catenin-dependent WNT signalling. These signalling 
defects correlate with the severity of the fetuses’ phenotypes—the non- 
sense Q70X mutation (responsible for TETAMS) behaving as a null 
mutation, and the R69C mutation (responsible for HFH-RTRD) as a 
hypomorphic allele. 

In mice, Rspo2 is expressed in the apical ectodermal ridge of the 
growing limb and in the lung mesenchyme’. Accordingly, Rspo2 
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Fig. 1 | Identification of RSPO2 mutations in fetuses presenting with 
severe limb defects. a, Pedigrees of family 1 (F1) with HFH-RTRD and 
families 2-5 (F2-F5) with TETAMS, and RSPO2 germline mutation status 
for available family members. b, Pictures and radiographs illustrating limb 
defects in a fetus with HFH-RTRD, and complete absence of limbs and 
lungs in a fetus with TETAMS. gw, gestational weeks. c, RSPO2 genomic 
(top) and protein (bottom) structures with identified mutations. Ex., exon. 
UTR, untranslated region. d, Co-immunoprecipitation (IP) of alkaline 
phosphatase (AP)-tagged RSPO2 lacking the C-terminal domain 


homozygous mutant mice show lung hypoplasia and limb trunca- 
tions!!~'*, Consistent with a reduction in canonical WNT signalling, 
these Rspo2 phenotypes could be recapitulated in embryos born to 
gestating mice fed with a pan-WNT inhibitor’? (Fig. 2a-d). Because 
individual Lgr4, Lgr5 or Lgr6 mutant mice*””* and LGR4 or LGR6 
human knockout individuals? do not present any limb or lung 
phenotypes, we surmised that functional redundancy might exist 
between these three receptors. We therefore set to recapitulate 
TETAMS in mice by genetically deleting Lgr4/5/6 in all embryonic 
tissues (Extended Data Fig. 4a). Lgr4*/Lgr5*’-Lgr6/~ animals were 
inbred, yielding a 1 in 16 chance of obtaining triple-knockout Lgr4/5/6 
offspring (Extended Data Fig. 3a). Five triple-knockout Lgr4/5/6 
embryos were obtained at embryonic day (E) 14.5 or E18.5. Although 
they displayed the expected Lgr4 and Lgr5 phenotypes (Extended Data 
Fig. 4b-e)”°?!5, none exhibited phenotypes reminiscent of tetra-amelia 
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(AC; RSPO2-AC-AP) with Flag-tagged extracellular domains (ECD) of 
LGR5 (LGR5-ECD-Flag). WT, wild-type. e, Co-immunoprecipitation of 
RSPO2-AC-AP with RNF43-ECD-Flag. Asterisks indicate non-specific 
bands. Experiments in d and e were repeated three times. 

f, SUPERTOPFLASH assay in HEK293T-STF cells transfected with 
WNTS3A and the indicated RSPO2 constructs. n = 3 biological replicates. 
Data are mean + s.e.m. *P< 0.05, **P< 0.01, ***P < 0.001, one-way 
analysis of variance (ANOVA) with Bonferroni's correction. For gel source 
data, see Supplementary Fig. 1 


with lung agenesis (Fig. 2e-g and Extended Data Fig. 3b, c). This sug- 
gests that the LGR4, LGR5 and LGR6 receptors are not functionally 
redundant and do not mediate RSPO2 signalling for limb and lung 
morphogenesis. Other developmental phenotypes such as cleft palate” 
and ankyloglossia”’ were common between Rspo2 and triple-knockout 
Lgr4/5/6 embryos (Extended Data Figs. 3d, 4e, Extended Data Table 1). 
Notably, Rspo3 knockout mouse embryos die at E10.5 owing to defec- 
tive vascularization’’, a phenotype not seen in Lgr4/5/6 triple-knockout 
embryos (Extended Data Fig. 3e). Thus, RSPO3-mediated vasculari- 
zation is also LGR4/5/6-independent. These in vivo genetic findings 
suggest that RSPO2 and RSPO3 may engage other receptors for limb, 
lung and vascular development. 

The expression of Lgr4/5/6, Rnf43 and Znrf3 could be examined 
in the recovered mutant embryos at E14.5. As expected, they had no 
Lgr4 or Lgr5 expression (Extended Data Fig. 5a—e). Some residual 
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Fig. 2 | Mouse Lgr4/5/6 triple-knockout embryos do not recapitulate the 
Rspo2 and Rspo3 phenotypes. a-d, PORCN inhibition using 

Wnt-C59 during embryogenesis leads to limb and lung defects. 

a, Experimental scheme. b, Scoring of limb phenotypes in vehicle-treated 
(n= 30) and Wnt-C59 treated (n = 46) embryos. ***P <0.001, two-sided 
Fisher’s exact test. c, Representative images of treated embryos. Grey (no 
zeugopod and autopod), and white (amelia) arrows denote limb defects. 
Scale bars, 1 mm. d, Representative images of lungs from treated embryos. 
Scale bar, 1 mm. e-g, Triple-knockout Lgr4/5/6 does not lead to limb or 
lung defects. Representative photos of a Lgr6é knockout (n= 4) (e) and 


Lgr6 expression was observed, which may be originating from an 
alternative downstream methionine that would delete the LGR6 signal 
peptide’. During limb development, Lgr4 and Lgr5 were not expressed 
in the overlying ectoderm of the limb bud, whereas ectodermal Lgr6 
and Wnt3 co-localized with Rspo2 in the apical ectodermal ridge. The 
expression of Znrf3 was ubiquitous, whereas Rnf43 was restricted to 
the ectoderm (Fig. 2h). In developing lungs, robust Lgr6 expression 


Mesenchyme: Lgr4/5, 


a triple-null Lgr4/5/6 (n=5) (f) embryo at E14.5, with intact limbs and 
lungs. Dotted lines indicate the size difference and expected liver position. 
Scale bars, 1 mm. g, PCR-based genotyping. KO, knockout. h, Duplex 
RNAscope images for the indicated transcripts (blue) and Rspo2 (pink) 

in transverse sections of wild-type forelimbs. AER, apical ectodermal 
ridge. Strongly expressed genes are denoted in bold (summary on the 
right). Scale bars, 0.1 mm. i, Haematoxylin and eosin (H&E) and antibody 
staining in coronal sections of wild-type (top) and triple-knockout 
(bottom) Lgr4/5/6 lungs. Scale bars, 50 jum. Experiments in h and i were 
repeated three times 


was detected in the smooth muscle cells (SMCs), whereas Lgr4 and 
Lgr5 were expressed at low levels in both the epithelium and mes- 
enchyme lineages. Znrf3 was ubiquitous and overlapped with Rspo2 
in the mesenchyme, whereas Rnf43 expression was restricted to the 
lung epithelium (Extended Data Fig. 5f). Using the enhanced green 
fluorescent protein (eGFP) reporter of the Lgr4/5/6 knock-in alleles 
(Extended Data Fig. 4a), we confirmed eGFP expression in a single 
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Fig. 3 | Exogenous and endogenous RSPO2/3 signal in Lgr4/5/6 
triple-knockout mouse embryonic fibroblasts. a, SUPERTOPFLASH 
assay in HEK293T-STF cells transfected with WNT3A and the indicated 
RSPO2 or RSPO3 constructs. n = 4 biological replicates. b, qPCR analysis 
for Axin2 in Lgr4/5/6 triple-knockout SV40-immortalized mouse 
fibroblasts treated with WNT3A and/or RSPO1-RSPO4. n= 3 biological 
replicates. c, SUPERTOPFLASH assay in wild-type (top, n= 6) and 


layer of vimentin-positive cells adjacent to the E-cadherin-positive lung 
epithelium (Fig. 2i). In summary, these results demonstrate consistent 
co-expression of Rspo2 with Znrf3 at E14.5, whereas only partial overlap 
with Lgr4/5/6 was seen. 

We noticed that mutant RSPO2(F105A/F109A) and RSPO3(F106A/ 
F110A), which cannot bind LGRs, are still able to enhance WNT signal- 
ling in HEK293T-STF cells (Fig. 3a). Neural progenitor cells (NPCs), 
induced pluripotent stem (iPS) cells and SV40-immortalized dermal 
fibroblasts were derived from E14.5 wild-type and mutant embryos 
to test the activity of exogenous and endogenous R-spondin ligands 
in triple Lgr4/5/6-null cells (Extended Data Fig. 6a—c). Recombinant 
RSPO2 and RSPO3, but not RSPO1 and RSPO4, could still amplify 
WNT3A-mediated signalling in Lgr4/5/6 triple-knockout immortalized 
STF-fibroblasts (Fig. 3b, c). Most importantly, short interfering RNA 
(siRNA)-mediated depletion of endogenous Rspo2 or Rspo3 was suffi- 
cient to significantly decrease expression of the WNT direct target gene 
Axin2 in WNT3A-treated Lgr4/5/6 triple-knockout fibroblasts. This 
may be explained by RSPO2 and RSPO3 inhibition of ZNRF3, because 
siRNA-depletion of endogenous Znrf3 resulted in increased endoge- 
nous Axin2 expression (Fig. 3d, e and Extended Data Fig. 6d). These 
in vitro data support our in vivo results, and confirm that cells that 
lack LGR4/5/6 are still sensitive to RSPO2/3-mediated WNT signalling 
enhancement. Similar observations were made in human haploid cells 
mutant for LGR4/5/6”. 

To further validate the causal link between RSPO2 deficiency and 
amelia, we unilaterally injected rspo2 guide RNA (gRNA) with Cas9 
protein into Xenopus tropicalis embryos at the two-cell stage (Fig. 4a). 
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Lgr4/5/6 triple-knockout (bottom, n = 3 biological replicates) 
SV40-immortalized mouse fibroblasts treated with WNT3A and/or 
RSPO1-RSPO4. d, e, qPCR analysis for Axin2 in Lgr4/5/6 triple-knockout 
SV40-immortalized mouse fibroblasts transfected with the indicated 
siRNAs, and treated with or without WNT3A. n= 3 biological replicates. 
Data are mean +s.e.m. NS, not significant. *P < 0.05, **P< 0.01, 

*** P < 0.001, one-way ANOVA with Bonferroni's correction 


Targeted next-generation sequencing and BATCH-GE analysis demon- 
strated very high in vivo efficiencies for rspo2 deletions (Supplementary 
Table 1), which caused marked unilateral forelimb and hindlimb amelia 
(Fig. 4b, d). Because we showed in mice that LGR4, LGR5 and LGR6 are 
not involved in limb development, we examined RNF43 and ZNRF3, 
which may serve as alternative cell-surface RSPO2 receptors. The use of 
Xenopus allows us to bypass a possible mammalian-specific RNF43 and 
ZNRF3 requirement for placental vascularization. Both ligases were 
uniformly expressed in developing limb buds (Fig. 4c). Two TALEN 
pairs for each gene were selected for their very high cutting efficiency 
(Supplementary Table 1). Although limb defects were rare, or absent, 
within single rnf43 or znrf3 mutants, unilateral ectopic limbs were very 
prominent in znrf3/rnf43 double-mutant frogs (Fig. 4d-h). Alizarin 
red and alcian blue staining revealed a diverse spectrum of limb phe- 
notypes ranging from diplopodia to complete polymedlia, with bifur- 
cations arising at distinct locations across the stylopod, zeugopod or 
autopod. Extreme cases presented up to quadruplication of forelimbs 
(Fig. 4h and Extended Data Fig. 7b), a phenotype that is the inverse of 
rspo2 crispant frogs that display total amelia (Extended Data Fig. 7a). 
We conclude that in the context of limb development, RSPO2 behaves 
as a direct antagonistic ligand to RNF43 and ZNRF3 without the need 
for LGR4/5/6. This ligand-receptor pair constitutes a master switch 
that governs the number of limbs an embryo should form. It will be 
important to assess whether this pathway can in part contribute to 
the disappearance of limbs during evolution, particularly in cetaceans 
and snakes, which are tetrapods that have become bi-amelic and tetra- 
amelic, respectively. It is also tempting to speculate whether the same 
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Fig. 4 | Frogs mutant for both rnf43 and znrf3 display complete limb 
duplications. a, Experimental scheme using X. tropicalis. b, Representative 
rspo2 crispant (stage 63; n = 21). Scale bar, 0.5cm. ¢, znrf3 and rnf43 in situ 
hybridization in stage 50 limb buds. Scale bar, 300 jum. The experiment 
was repeated three times. d, Scoring of limb phenotype. n denotes number 
of froglets. NS, not significant; ***P < 0.001, significantly different from 
normal (” test). e, Stage 67 znrf3/rnf43 double mutant with a duplicated 
right hindlimb. Scale bar, 0.5 cm. f, g, External view and alizarin red and 


embryonic signals may be re-mobilized in salamanders, which are 
capable of complete adult limb regeneration after amputation®”. 

The current model suggests that RSPO-LGR form ligand-receptor 
pairs that serve to increase WNT signalling through direct inhibition 
of the two E3 ligases RNF43 and ZNRF3, which otherwise ubiquitinate 
WNT receptors for degradation*!?. Here we challenge this view and 
show that during embryogenesis, the concomitant loss of LGR4, LGR5 
and LGR6 receptors does not phenocopy the loss of RSPO2 or RSPO3 
(Fig. 4i). 

Gain-of-function variants in RSPO2 and RSPO3 and loss-of-function 
alleles in RNF43 and ZNRF3 are the most frequent somatic mutations 
in colorectal cancer patients**34. LGR5-positive cells have been shown 
to represent the major cell of origin of colorectal cancer*>; however, 
pathogenic mutations in this WNT-associated receptor have not been 
documented so far. Our findings that RSPO2 and RSPO3 can inhibit 
RNF43 and ZNRF3, without the need for LGR4/5/6, raise the question 
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alcian blue staining of rnf43/znrf3 double mutants (stages 62 (f) and 66 
(g)). Scale bars, 0.3 cm. h, rnf43/znrf3 double-mutant tadpole (stage 59) 
displaying quadruplication of the right forelimb (three are visible). Scale 
bar, 0.2 cm. In e-h, 61 znrf3/rnf43 double-mutant froglets with polymelia 
were obtained. i, Updated model for LGR-dependent R-spondin processes 
(left), and LGR-independent RSPO2/3 signalling (right), which may 
involve the activity of a hitherto unknown receptor X 


of whether LGRs have any functional relevance to carcinogenesis. The 
ubiquitous triple-knockout Lgr4/5/6 during embryogenesis serves as a 
proof-of-concept for subsequent organ-specific deletions, and should 
enable this question to be addressed. 
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METHODS 


Fetuses and clinical assessment. The five families included in this study were 
enrolled from genetic departments of Istanbul, Turkey; Coimbra, Portugal; Nantes, 
France; and Mangalore, India. Eight out of eleven affected individuals had been 
clinically and radiologically evaluated by experienced clinical geneticists. Autopsy 
was performed in five cases (see Extended Data Table 1). Families 2 and 3 had 
been previously reported!®!”. Written informed consent in accordance with the 
Helsinki protocol was obtained from family members before inclusion to the 
research protocol. Consent to publish photos was obtained from the families. The 
studies were performed in compliance with all relevant ethical regulations from 
the respective institutions. Approvals were obtained from the Istanbul University, 
Istanbul Medical Faculty ethical committee, Turkey (CRANIRARE: 2008/1194 
and CRANIRARE-II: 2012/743-1061) as well as the Ko¢ University School of 
Medicine (KUSoM) ethical committee, Istanbul, Turkey (2015.120.IRB2.047 
CRANIRARE-2) for families 1 and 2; from the ‘Comissio de Etica do Centro 
Hospitalar de Coimbra, Coimbra, Portugal (2009/42, 1724/Sec) for family 3; from 
the ‘Comite de protection des personnes Ouest IV’ Nantes, France (DC-2011-1399) 
for family 4; and from the institutional ethics committee of Kasturba Hospital, 
Manipal, India (ECR/146/Inst/KA/2013, IEC 430/2013) for family 5. 
Genotyping and exome sequencing. DNA was extracted from skin biopsy samples 
of affected cases and from peripheral blood leukocytes of parents and healthy 
siblings by standard procedures. Affected individuals, or parents in the absence 
of samples from the affected individuals, were previously screened and excluded 
for any functional sequence variations/mutations in the WNT3 (NM_030753) 
and WNT7A (NM_004625) genes. Whole-exome capture of subjects II:6 and II:7 
from family 1 were performed using Agilent SureSelect Human All Exon v4.0 kit, 
sequenced on Illumina HiSeq2000 platform using TruSeq v3 chemistry at a mean 
coverage of 50 x , reads provided in Fastq files were mapped to human genome 
(hg19) using Burrows-Wheeler Aligner (BWA package version 0.6.2). Local 
realignment was performed by Genome Analysis Tool Kit (GATK). Duplicated 
reads were marked for exclusion from further analysis using Picard (version 
1.83) tool. Further alignment manipulations were performed by Samtools (ver- 
sion 0.1.18). Base quality (Phred scale) scores were recalibrated using the GATK 
covariance recalibration for each sample (Oxford Gene Technology), and variant 
calling was performed using ANNOVAR tool with avSNP release of 142, 1,000 
genomes release of 2014 along with NIH-NHLBI 6500 exome database version 2. 
All the alterations including overlapping homozygous variants complying with 
minimum of 20 read depths were considered for further evaluation and browsed 
on OGT NGS (Oxford Gene Technologies’ Next Generation Sequencing) and IGV 
(Integrated Genomic Viewer) browsers. Screening for RSPO2 mutations in addi- 
tional affected individuals or parents, and segregation validation were performed 
by Sanger sequencing, with PCR primers designed to cover all the coding exons 
and the flanking regions according to RefSeq accession number NM_178565 
(Extended Data Table 2a). 

Array-CGH and SNP-array analysis. Oligonucleotide array-CGH was per- 
formed using SurePrint G3 Human CGH Microarray ISCA 4 x 180K v2 (Agilent 
Technologies), and the SNP-Array adopted was 300 K HumanCytoSNP-12v2-1 
(Illumina Inc.). The 180 K kit has an overall median probe spacing of 13 kb, and 
the SNP-Array has 6.2 kb. Analyses were performed according to the protocols 
provided by the suppliers (Agilent Oligonucleotide Array-Based CGH for Genomic 
DNA Analysis and Illumina Karyostudio & Bluefuse Multi Softwares). Arrays 
were scanned using a NimbleGen MS 200 for Agilent SurePrint array and I-Scan 
instrument for HumanCytoSNP-12v2-1. Genomic positions were based on the 
UCSC February 2009 human reference sequence (hg19) (NCBI build 37 reference 
sequence assembly). 

Constructs. An RSPO2-AC-AP-pCDNA3 plasmid (gift from C. Niehrs) encoding 
the wild-type human RSPO2 open-reading frame (ORF) without the C-terminal 
domain (AC) (NP_848660 amino acids 1-206) tagged with alkaline phosphatase 
(AP) was used to generate the RSPO2-AC-AP R69C and F105A/F109A mutant 
constructs (RSPO2(R69C) and RSPO2(F105A/F109A)) using the QuikChange 
Mutagenesis Kit (Stratagene 200522). R69C is the missense mutation found in 
family 1, and the F105A/F109A mutations specifically abolish the interaction of 
RSPO2 with the LGRs***’, RSPO2-A70-AP (RSPO2(Q70X), mutation found 
in family 2) was obtained by PCR and ligation (deleting the C-terminal region 
of the protein downstream of position 69). Deletion of the RSPO2 C-terminal 
domain (RSPO2-AC) decreases its retention on the cell surface without affect- 
ing its receptor binding and WNT enhancement properties**. A construct for 
a secreted alkaline phosphatase was used as a negative control. For cell surface 
binding assay experiments, V5-LGR5-pCS2 + (gift from C. Niehrs”), and pCMV6- 
Entry-RNF43 (Origene RC214013) plasmids were used. For co-immunoprecipi- 
tation experiments, the signal peptide and extracellular domains (ECD) of LGR5 
(NP_001264156 amino acids 1-557), ZNRF3 (NP_001193927 amino acids 1-219), 
and RNF43 (NP_060233 amino acids 1-197) were subcloned in pCS2 + with a Flag 
tag at their C terminus. 
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HEK293T and HEK293T-STF cell culture. HEK293T (from ATCC) and 
HEK293T-STF (SUPERTOPFLASH, gift from D. Virshup, from ATCC) cell lines 
have not been authenticated but were tested negative for mycoplasma contam- 
ination. They were cultured on plates coated with poly-L-lysine (Sigma P4707) 
with the following medium: DMEM High glucose (HyClone SH30081.01) with 
10% fetal bovine serum (Thermo Scientific SH30070), and 2 mM 1-glutamine 
(ThermoFisher Scientific 25030081). Cells were transfected with DNA plasmid 
using the FuGENE HD Transfection Reagent (Promega E2312) in OptiMEM 
medium (Gibco 31985070). 

Expression and secretion studies. For protein extraction, HEK293T cells were 
lysed using appropriate amount of RIPA buffer (Tris-HCl pH7.5, 50 mM, NaCl 
150mM, NP-40 0.1%, Na?*-deoxycholate 0.05%) supplemented with proteinase 
inhibitors (Complete, Roche 04693159001). Lysates were centrifuged at 17,000g 
for 15 min at 4°C to remove cell debris, and the supernatants (protein extracts) 
were collected. For secretion studies, culture medium was changed 24h after 
transfection with a serum-free medium Pro293a-CDM (Lonza 12-764Q) supple- 
mented with L-glutamine without or with 50 jg ml! of heparin (Sigma H3149). 
Secretion was allowed for 48 h before collection of the conditioned media. For 
western blotting, samples were electrophoresed with reducing Laemeli loading 
buffer after denaturation at 95°C for 10 min. The protein ladder (Bio-Rad 161- 
0377) and denatured and reduced samples were loaded onto 4-20% gradient pre- 
cast gels (BioRad Criterion 567-1093) in 1 x running buffer (25 mM Tris, 200 mM 
glycine, 0.1% SDS) and ran at 80-180 V until desired separation. Gels were trans- 
ferred onto 0.2 zm PVDF membranes (BioRad Criterion 170-4157) using the 
Trans-Blot TurboTM transfer system for 7 min. Membranes were blocked for 1h 
at room temperature with 5% milk in TBST (50 mM Tris-HCl pH 7.5, 150 mM 
NaC]; with 0.05% Tween20). Membranes were incubated with primary antibody 
diluted in 5% milk in TBST at 4°C overnight (anti-alkaline phosphatase, 1:2,000, 
GenHunter Q310; anti-Flag, 1:1,000, Cell Signaling 14793 S; anti-GAPDH, 1:4,000, 
SantaCruz 47724). After washes in TBST, membranes were incubated for 1h at 
room temperature with secondary antibodies (Mouse-HRP 71503510 or Rabbit- 
HRP 711035152, 1:4,000, Jackson Immuno) in 5% milk in TBST. After several 
washes in TBST, the signal was revealed with the HRP substrate (Thermo Scientific 
SuperSignal 34080/34076/34096) for 3 min at room temperature. Membranes were 
then exposed to CL-Xposure films (Thermo Scientific 34091), and developed in a 
Carestream Kodak developer. 

Co-immunoprecipitation experiments. Conditioned media containing either 
of the RSPO2-AC-AP forms or the different receptor-ECD-Flag proteins were 
obtained after transfection in HEK293T. Conditioned media with equivalent 
amount of each RSPO2-AC-AP forms were first mixed with conditioned medium 
containing the receptor-ECD-Flag of interest for 4h at 4°C. At the same time, 
Protein G Dynabeads (Novex 10003D) were conjugated with anti-Flag antibodies 
(Sigma F3165) for 4h at 4°C. The media mixes (inputs) were then incubated with 
the conjugated beads overnight at 4°C. After washes, the beads were re-suspended 
with 2 x reducing Laemeli loading buffer. After centrifugation, the supernatants 
(immunoprecipitants) were subsequently used for western blotting. 

Cell-surface binding assay. Twenty-four hours after transfection of HEK293T cells 
with V5-LGR5-pCS2 + (gift from C. Niehrs”), pCMV6-Entry-RNF43 (Origene 
RC214013) or pCS2 + (empty vector) in 24-well plates, the cell culture medium 
was replaced for 3h with 300 ul of conditioned medium containing equivalent 
amounts of RSPO2-AC-AP proteins (as determined by western blot), to assess 
for their cell-surface binding. After washes with PBS, cells were lysed with PBS 
containing Triton X-100 1% and 1 x proteinase inhibitor, and then incubated at 
65°C for 1h to inhibit endogenous alkaline phosphatase activity. After centrifu- 
gation at 17,000 g for 2 min, supernatants (protein extracts) were collected and the 
protein concentration was measured (Pierce BCA protein assay kit 23225). Eighty 
microlitres of the same quantity of total proteins for each condition was added to 
80 1 of BM Purple (Roche 11442074001) and incubated overnight at 4°C in the 
dark for chromogenic development. Pictures were taken with the NCS Microtek 
Artixscan F1 scanner. 

SUPERTOPFLASH luciferase assay. HEK293T-STF cells were transfected with 
the human WNT3A gene (hWNT3A-pCS2 +) and the Renilla luciferase (pRL- 
CMV vector). RSPO2-AC-AP constructs were either transfected or conditioned 
media was added 24h after transfection for another 24h incubation. The expression 
of the firefly (STF) and Renilla luciferases were measured using the Dual-Luciferase 
Reporter Assay system (Promega E1960) 48h after transfection. Measurements 
were done on opaque 96-well plates using a luminometer. Luminescence data are 
represented as the firefly luminescence relative to the Renilla luminescence and 
total protein concentration. Plotted are the values relative to the values for WNT3A 
plus alkaline phosphatase alone. For SV40-immortalized mouse fibroblasts-STE, 
cells were treated for 24h with recombinant proteins re-suspended in PBS con- 
taining 0.1% BSA. Luminescence data are represented as the firefly luminescence 
relative to total protein concentration. Plotted are the values relative to the val- 
ues for WNT3A alone. Data are average of three independent experiments and 
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statistical analyses were done using the PRISMS software using a one-way ANOVA 
test with Bonferroni correction for multiple hypothesis testing. Not significant indi- 
cates P> 0.05. *P< 0.05, **P<0.01 and ***P < 0.001. Error bars indicate s.e.m. 

Mouse lines. All experiments on mice were executed in accordance with the guide- 
lines and regulations of the respective institutions. Approvals were obtained in 
Singapore by the Institutional Animal Care and Use Committee (IACUC 140907, 
171263, and 171210) and in Germany by the ethical committee of the Institute of 
Molecular Biology (IMB) gGmbH (permit number 23-177-07/G 16-05-071E1). 
Lgr4-eGFP-ires-creERT2, Lgr5-eGFP-ires-creERT2 and Lgr6-eGFP-ires-creERT2 
mice have been described previously!””°. We incrossed Lgr4*/—Lgr5*/-Lgr6~/— 
animals and analysed 82 embryos (63 embryos at E14.5, and 19 embryos at E18.5, 
both male and female). See Extended Data Table 2b for the genotyping PCR 
primers used. For Porcupine inhibitor treatments, Wnt-C59 (Tocris, 5148/10) or 
vehicle (PBS containing 0.5% methylcellulose, 0.01% Tween-20, and 5% DMSO) 
was orally administered to wild-type pregnant females at 5 j1g g_' dam body weight 
every 12h between E9.75 and E14.25, and embryos were collected at E14.5. We 
analysed a total of 30 embryos from vehicle-treated mothers, and 46 embryos from 
Wnt-C59-treated mothers. No statistical method was used to predetermine sample 
size. No randomization or blinding was used during the experimental procedures. 
Mouse embryos sections. Embryos were fixed overnight at 4°C in 4% paraform- 
aldehyde and stored in 70% ethanol at 4°C. For H&E staining, RNAscope and 
immunofluorescence experiments, embryos were subsequently paraffin-embedded 
and sectioned using a Leica RM2255 microtome. H&E staining was performed 
on deparaffinized and rehydrated 6 1m tissue sections according to standard 
protocols, and images were taken with the MetaSystems Metafer Slide Scanner. 
RNAscope experiments were performed on deparaffinized and rehydrated 6 
jum tissue sections using the RNAscope 2.5 HD Duplex Assay (Advanced Cell 
Diagnostics ACD 322430). The ACD RNAscope probes used in this study are 
as follow: Mm-Rspo2-C2 (402001-C2), Mm-Lgr4 (318321), Mm-Lgr5 (312171), 
Mm-Lgr6 (404961), Mm-Rnf43 (400371), Mm-Znrf3 (434201), Mm-Wnt3 
(312241). RNAscope images were taken using a Zeiss Axiolmager Z1 upright 
microscope with the ZEN software. Immunofluorescence was performed on dep- 
araffinized and rehydrated 7 1m tissue sections. Antigen retrieval was carried out 
by heating slides in a pressure cooker (121°C) for 20 min at pH 9.0 (S2367, DAKO). 
The following primary antibodies were used: chicken anti-eGFP (1:2,000, Abcam 
ab290), rabbit anti-vimentin (1:500, Abcam ab92547), mouse anti-E-cadherin 
(1:200, BD Transduction Laboratory 61018). The secondary antibodies used were: 
anti-chicken/rabbit/mouse Alexa 488/568/647 IgG (1:500, Invitrogen). Images were 
taken on the Olympus FV3000 inverted confocal microscope. 

Generation of mouse cell lines. Embryos were collected at E14.5 and primary 
dermal fibroblasts were derived from embryonic back skin explants, and primary 
NPCs were isolated from brain cortices. Primary fibroblasts were immortalized 
via SV40 large T antigen retroviral infection using standard protocol (pBABE-neo 
largeTcDNA, Addgene plasmid 1780). SV40-immortalized mouse fibroblasts- 
SUPERTOPFLASH (STF) were generated with the insertion of 7xTcf-firefly 
luciferase by retroviral infection using standard protocol (7TFP, Addgene plas- 
mid 24308). For NPC derivation, cortices were first excised from whole brains 
and mechanically dissociated using a pipette. Cells were cultured as suspension 
in NPC medium: DMEM/F12 (Gibco 11320) containing 1 x N-2 supplement 
(ThermoFisher Scientific 17502001), 1 x B-27 supplement (ThermoFisher 
Scientific 17504001), 2 mM L-glutamine (ThermoFisher Scientific 25030081), 
0.1mM NEAA (Gibco 1140-050), 20ngml~! FGF-2 (R&D Systems 233-FB) 
and 20ngml~! EGF (R&D Systems 236-eg). NPCs in suspension were passaged 
with accutase (Merck Millipore SCR005) every 4—5 days. For adherent cultures, 
NPCs were plated onto Matrigel-coated (Corning 354231) plates in NPC medium. 
Adherent mouse NPCs were reprogrammed by transduction of human OCT4 (also 
known as POU5F1), SOX2, KLF4 and MYC (Addgene plasmids 17225, 17226, 
17227 and 18119)*” using standard protocol. After 4 days, transduced cells were 
reseeded onto irradiated mouse embryonic fibroblasts in mouse iPS cell medium: 
knockout-DMEM (ThermoFisher Scientific 1029018) supplemented with 10% 
Knock Out Serum Replacement (ThermoFisher Scientific 10828028), 10% FBS 
(ThermoFisher Scientific 16000044), 0.1 mM 2-mercaptoethanol (ThermoFisher 
Scientific 21985023), 2 mM L-glutamine (ThermoFisher Scientific 25030081), 
0.2mM NEAA (ThermoFisher Scientific 1114050) and 1,000 U ml~! mouse LIF 
(Stem Cell Technologies 78056). iPS cell colonies were picked between days 7 and 
15 and maintained in mouse iPS cell medium for expansion on gelatin-coated plates. 
Cell treatment with recombinant proteins. For WNT signalling response exper- 
iments, immortalized fibroblasts were treated for 24h in serum-free medium with 
25ng ml! recombinant WNT3A (R&D Systems 5036-WN), and/or 400ngml"! 
recombinant RSPO1 (4645-RS), RSPO2 (3266-RS), RSPO3 (3500-RS) or RSPO4 
(4575-RS) re-suspended in PBS containing 0.1% BSA. 

siRNA experiments. siGENOME SMARTpool mouse Rspo2 (Dharmacon 239405), 
Rspo3 (72780), Znrf3 (407821) or negative control (D-001206-14-05) siRNAs 
were used to transfect SV40-immortalized mouse fibroblasts. The transfection of 


37.5nM siRNA was performed with Lipofectamine RNAiMAX (Invitrogen 13778- 
075) or DharmaFECT (Dharmacon T-2001) transfection reagents according to 
manufacturer protocols. 

qPCR experiments. For qPCR experiments, embryonic organs or culture cells were 
lysed in the QIAGEN RLT buffer and total RNAs were extracted using the QIAGEN 
RNeasy Mini kit (74106), including the optional DNase RNase-free treatment. 
cDNAs were obtained using the iScript reverse transcription supermix (Bio-Rad 
170-8841). qPCR were performed with primers described in Extended Data 
Table 2c using the Power SYBR Green Master mix (Applied Biosystems 4367659) 
on the Applied Biosystems 7900HT Fast Real-Time PCR system. Plotted are data 
relative to Actb and to the control condition. Data are the average of at least three 
biological triplicates and statistical analyses were done with the PRISM5 software 
using a one-way ANOVA test with Bonferroni correction for multiple hypothesis 
testing when more than two groups were compared, or an unpaired t-test with 
Welch’s correction when less than three groups were compared. 

Xenopus tropicalis experiments. All experiments on X. tropicalis were executed 
in accordance with the guidelines and regulations of Ghent University, faculty 
of Sciences, Belgium. Approval was obtained by the ethical committee of Ghent 
University, faculty of Sciences (permit number EC2017-093). No statistical method 
was used to predetermine sample size. No randomization or blinding was used 
during the experimental procedures. 

Xenopus tropicalis whole-mount in situ hybridization. Probes for rnf43 and znrf3 
were designed by amplifying the coding sequence by PCR with primers linked to 
RNA-polymerase sites (Supplementary Table 1). Sense and antisense RNA probes 
were generated by in vitro transcription with the appropriate RNA polymerase 
and digoxigenin-rUTP-label. Whole-mount in situ hybridization was carried out 
as previously described. Imaging was performed with a Leica MZ FLIII stereom- 
icroscope/Leica DC300F camera. 

Generation of X. tropicalis mosaic mutants by TALEN or CRISPR-Cas9. 
TALENSs were generated using the Golden Gate Cloning protocol as previously 
described"! and yields were quantified by Nanodrop (ThermoFischer Scientific). 
Embryos (two-cell stage) were injected unilaterally with 75 pg rf43-TALEN-ELD, 
rnf43-TALEN-KKR, znrf3-TALEN-ELD and znrf3-TALEN-KKD (Extended Data 
Table 2e). Guide RNA targeting rspo2 (Extended Data Table 2e) was designed 
with CRISPRScan (http://www.crisprscan.org/) and generated as previously 
described*, and yield was quantified by Qubit BR RNA assay (ThermoFischer 
Scientific). Embryos were injected unilaterally at the two-cell stage with 47 pg rspo2 
gRNA and 900 pg NLS-Cas9-NLS (VIB/UGent Protein Service Facility). Both male 
and female animals were included in the study once they passed Nieuwkoop stage 
59 and were scored as normal, or displaying either amelia (absence of at least 
one limb) or polymelia (at least one limb showing signs of duplication) by stere- 
omicroscopic examination of limbs. For quantitative analysis of genome editing, 
nine embryos per injected clutch were pooled and lysed overnight in lysis buffer 
(50mM Tris pH 8.8, 1mM EDTA, 0.5% Tween-20, 200 1g ml! proteinase K) at 
55°C. Genotyping PCRs were performed with the respective primer pairs shown 
in Extended Data Table 2f. Targeted deep sequencing of amplicons was performed 
as previously described and analysed by the BATCH-GE software package“. Indel 
frequency data and sequence variants for all targeted deep sequencing are shown 
in Supplementary Table 1. 

Skeletal staining of X. tropicalis tadpoles and froglets. For staining of the skeleton 
of the mutant animals, premetamorphic tadpoles and postmetamorphic froglets 
were euthanized using a 0.05% benzocaine solution. Animals were fully eviscer- 
ated, skinned and eyes were removed. Whole-mount alcian blue and alizarin red 
staining was performed as follows: 95% ethanol (4 days, change after 24h), 100% 
acetone (48h), 0.15% alcian blue 8GX (Sigma-Aldrich A3157) in 76% ethanol/20% 
glacial acetic acid/4% HO (24h), 70% ethanol (24h), 95% ethanol (12h), 1% KOH 
(6h), 0.05% alizarin red S (Sigma-Aldrich A5533) in 1% KOH (48h), 1% KOH 
(48h). Imaging was performed with a Leica MZ FLIII stereomicroscope/Leica 
DC300F camera. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. The data that support the findings of this study are available 
within the paper or from the corresponding authors upon reasonable request. 
The whole exome sequencing data for family 1, and the SNP-array data for F3-II:1 
shown in Extended Data Fig. 1 have been deposited in the NCBI Gene Expression 
Omnibus (GEO) database under accession numbers SRP136052 and GSE111781, 
respectively. 
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Extended Data Fig. 1 | Pictures of affected fetuses, exome sequencing 
analysis in family 1 and genotyping analysis in family 3. a, Pictures 
and radiographs of indicated fetal cases illustrating severe limb defects 
in fetuses with HFH-RTRD, and the complete absence of limbs in fetuses 


with TETAMS. b, Summary of exome sequencing analysis for family 1, 
revealing a single biallelic missense mutation in the RSPO2 gene. 

c, Summary of genotyping in family 3 by SNP-array and array-CGH 
analysis, revealing a homozygous deletion including exon 6 of RSPO2. 
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Extended Data Fig. 2 | The RSPO2(R69C) and RSPO2(Q70X) mutants 
fail to bind ZNRF3. a, The RSPO2 R69 residue (highlighted in purple) is 
highly conserved in vertebrates and within the human paralogues 
RSPO1-RSPO4. Conserved cysteine residues of the Furin-like 1 domain 
are highlighted in pink. Protein alignment performed with ClustalO. 

b, Western blotting of protein extracts and supernatants from HEK293T 
cells transfected with indicated constructs. Deletion of the RSPO2 
C-terminal domain (RSPO2-AC) decreases its retention on the cell 
surface without affecting its receptor binding and WNT enhancement 
properties**. The RSPO2(R69C) mutant was almost undetectable in 
conditioned media but was greatly increased by the addition of heparin 
in the medium. c, Co-immunoprecipitation of wild-type and mutant 


forms of RSPO2-AC-AP with the ProteinG-Flag beads only. Asterisk 
indicates an unspecific band. d, Co-immunoprecipitation of wild-type 


and mutant forms of RSPO2-AC-AP with 


the ZNRF3-ECD-Flag E3 


ligase. Asterisk indicates an unspecific band. e, Cell-surface binding assay 
of HEK293T cells transfected with empty vector, LGR5 or RNF43, using 
equivalent amounts of RSPO2-AC-AP conditioned media (western blot). 
Experiments in b-e were repeated three times. f, SUPERTOPFLASH 
assay in HEK293T-STF cells transfected with WNT3A in the presence of 
equivalent amounts of RSPO2-AC-AP conditioned media (western blot). 


n= 4 biological replicates. Data are mean J 


t s.e.m. NS, not significant. 


**P < 0.01, one-way ANOVA test with Bonferroni’s correction. For gel 


source data, see Supplementary Fig. 1. 
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a Lgr4*/- Lgr5*/- Lgr6~/- incrosses 
Embryos Expected Obtained Lgr4 Obtained Lgr5 
n=82 n % n % n % 
wt 20.5 25 19 23.2 20 24.4 
het 41 50 48 58.5 37 45.1 
hom 20.5 25 15 18.3 25 30.5 
Triple hom Expected Obtained 
82/16 = 5.125 5 
b Lgr4+/+ Lgr5*/- Lgr6-/- Lgr4+/- Lgr5-/- Lgr6-/- Lgr4-/- Lgr5+/- Lgr6-/- Lgr4-/- Lgr5~/- Lgr6-/- 
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Extended Data Fig. 3 | Mouse Lgr4/5/6 knock-in embryos do not Scale bar, 1 mm. All obtained embryos (see a) had a similiar phenotype. 
recapitulate the Rspo2 and Rspo3 phenotypes. a, Table indicating the d, H&E staining of coronal sections of the heads through the oral cavity. 
proportion and numbers of analysed embryos. The different genotypes Scale bar, 1 mm. Close-ups of the palatal shelves illustrate the cleft palate 
were obtained in Mendelian ratio (P > 0.05, 7 test). b, Normal limbs of present in the Lgr5/6 double-knockout (n= 2) and Lgr4/5/6 triple- 
embryos with indicated genotypes including a Lgr4/5/6 triple-knockout knockout (n = 2) embryos (indicated by black arrow heads). e, Properly 
embryo. All obtained embryos (see a) had a similiar phenotype. FL, vascularized placenta of a Lgr4/5/6 triple-knockout embryo compared 


forelimbs; HL, hindlimbs; L, left; R, right. Scale bar, 1mm.c, Normallungs to a Lgr6-knockout embryo. All obtained embryos (see a) had a similiar 
of embryos with indicated genotypes including a Lgr4/5/6 triple-knockout —_ phenotype. Scale bar, 1 mm. 
embryo. Comparable lung length relative to body length is indicated. 
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Extended Data Fig. 4 | Mouse Lgr4, Lgr5 and Lgr6 knock-in embryos 
recapitulate the knockout phenotypes. a, Illustration of the GFP 
knock-in in exon 1 of the Lgr4, Lgr5 and Lgr6 genes, which cause 
loss-of-function mutations. b, Liver weight in single Lgr4~/~ (n= 2) and 
triple Lgr4/5/6~/~ (n= 2) compared to wild-type (n =3) E14.5 embryos. 
Data are mean + s.e.m. NS, not significant. *P < 0.05, one-way ANOVA 
test with Bonferroni’s correction. c, Lgr4~/~ embryos (n= 4) have a smaller 
liver. Liver weight is indicated. Scale bar, 1 mm. d, Lgr4~/~ embryos (n= 2) 


Lgr4/- Lgr5*/+ Lgr6-/- 


7 8 910111213141516 1718 


EHH Hi “ 


Lgr4-/- Lgr4/5/6-/- 
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show female-to-male sex reversal. Blue arrowheads point to male-specific 
coelomic vessels. Genetic genders (XY or XX) are indicated. Scale bar, 

0.1 mm. e, H&E staining of coronal sections of the heads through the oral 
cavity. Scale bar, 1 mm. Close-ups of the tongue illustrate the ankyloglossia 
phenotype (tongue attached to the mouth floor, black arrow heads) 

in Lgr5~’~ embryos (n= 4), whereas the tongue is detached for other 
genotypes (white arrowheads). 
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Extended Data Fig. 5 | Mouse Lgr4, Lgr5 and Lgr6 knock-in cause 
loss-of-function mutations. a, b, qPCR analyses for Lgr4, Lgr5 and Lgr6 
expression in E14.5 limbs (a) and lungs (b) of wild-type, heterozygous and 
homozygous mutant embryos for the respective genes. c, d, qPCR analyses 
for Lgr4 and Lgr5 in embryonic intestine (c) and embryonic liver (d) of 
embryos with indicated genotype. e, qPCR analyses for Lgr6 in embryonic 
and adult skin of wild-type versus homozygous animals. n indicates 
number of embryos. Data are mean +s.e.m. NS, not significant. *P < 0.05, 


Relative Lgr6 fold change 


+/+ +/- -/- 


Relative Lgr6 fold change 


+/+ +/- -/- +/+ +/- -/- 


Rnf43 Rspo2 Znrf3 Rspo2 


Epithelium: Lgr5, Rnf43, 
Znrf3 


Mesenchyme: Rspo2, 
Lgr4 ,Znrf3 


SMC: Lgr6, Znrf3 


**P< 0.01, ***P < 0.001, one-way ANOVA test with Bonferroni’s 
correction or two-tailed unpaired t-test with Welch’s correction when 
less than three groups were compared (for Lgr6 qPCR analysis). f, Duplex 
RNAscope for the indicated gene (blue) and Rspo2 (pink) in transverse 
sections of wild-type E14.5 lungs. Strongly expressed genes are denoted 
in bold (summary on the right). Scale bars, 0.2 mm. Experiment repeated 
with three different wild-type embryos. 
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Extended Data Fig. 6 | Expression analyses in cells derived from mutant 
embryos. a, qPCR analyses for Lgr4, Lgr5 and Lgr6 in NPCs derived from 
embryos of indicated genotypes. n = 4 biological replicates. b, qPCR 
analyses for Lgr4 and Lgr6 in iPS cells derived from NPCs of indicated 
genotypes. n = 3 biological replicates. c, qPCR analyses for Lgr4 and 

Lgr5 in SV40-immortalized dermal fibroblasts derived from embryos 


of indicated genotypes. n= 3 biological replicates. d, qPCR analyses for 
Rspo2, Rspo3 and Znrf3 in Lgr4/5/6 triple-knockout SV40-immortalized 
fibroblasts, transfected with indicated siRNAs. n = 3 biological 
replicates. Data are mean + s.e.m. NS, not significant. *P< 0.05, 

**P < 0.01, ***P < 0.001, one-way ANOVA test with Bonferroni’s 
correction. 
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Extended Data Fig. 7 | Extreme phenotypes of Xenopus tropicalis 
mutant tadpoles. a, Control and rspo2 CRISPR-Cas9-injected tadpole 
showing a complete tetra-amelia phenotype probably due to incomplete 
cleavage at the time of injection and leakage of the CRISPR-Cas9 between 
the two blastomeres (n = 3 froglets). Scale bar, 1 cm. b, Alizarin red and 
alcian blue staining of a double-mutant rnf43/znrf3 TALEN-injected 
tadpole showing complete mirror-image diplopodia of both hindlimbs 
with 10 digits each (n= 4 froglets). Scale bar, 0.2 cm. 
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Extended Data Table 1 | Clinical characteristics of affected individuals with biallelic RSPO2 mutations 
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Humero-Femoral Hypoplasia with Radio-Tibial Ray Deficiency 


Tetra-amelia syndrome 


Family 1 Family 2 Family 3 Family 4 Family 5 
1-3 1-5 1-6 1-7 u-2 I-3 M4 u-1 I-41 i-2 1-3 
Gender male fetus female fetus female fetus male fetus male male fetus female fetus female fetus male fetus na Female fetus 
Termination of th th th th th nd th th th 
pregnancy 30" gw 26" gw 30" gw 15" gw nla 26" gw 22°" gw 20" gw 20" gw 13" gw 12th gw 
Year of birth 2001 2009 2011 2012 1989 1991 2006 2013 2015 2016 2017 
Age at death na va n/a na stillbirth na n/a na n/a n/a n/a 
Origin Turkey Turkey Portugal France India 
Consanguinity yes yes Not reported yes yes 
Case report This paper Basaran et al. , 1994 Sousa et al. , 2008 This paper This paper 
RSPO2 cDNA 154 kb deletion 
c.205C>T c.208C>T c.123delG 
(nM_73568 ee ||, wane 
RSPOZ2 protein % ie sd 
(NP_848660) p.Arg69Cys p.Gin70 nla p.Glu137 p.Gly42Valfs*49 
. parents are parents are 
Zygosity heterozygous homozygous homozygous parents are heterozygous homozygous homozygous homozygous heterozygous homozygous 
Symmetry of the + - + - + + + + + + a 
limb phenotype 
Hypoplastic 
shoulders. 
Hypoplastic upper On the right: 
limb girdles. bowed and short 
Flexion Bilateral severely humerus, single 
; Complete 
contractures of the hypoplastic upper tubular bone on 
(as ae : absence of left 
elbow joints with extremities with the forearm, limb. 
terygia. Short one, markedly thin thumb aplasia, i uppsr im): 
a Short and P Sra 2 ‘i i i " Complete Complete Complete Complete Rudimentary hypoplastic right 
Upper limbs arms with a single tubular bone, with — ending with two Amelia 3 
underdeveloped " 7 absence absence absence absence appendages arm with 
tubular bone, bilateral aplasia of finger like 
absence of 
ending with a thumbs and two appendages. 
; 5 : 7 forearm and 
single finger-like finger-like On the left: short h 
: 7 7 and. 
appendage with a appendages. arm with a single 
well-formed nail tubular bone, 
structure. ending with a 
single finger like 
appendage. 
Flexion 
contractures of the 
elbow joints with 
pterygia. Hypoplastic pelvis, 
Mesomelic Severely severely 
shortness of the hypoplastic lower hypoplastic lower Both femur 
legs with tibial limbs with single, limbs with a single f " ; seen, absence Absence of 
Lower limbs an Pree ed hemimelia. markedly thin tubular bone, Amelia Belle come da ena Daina of lower lower extrimities 
Pr Bilateral clubfeet, tubular bone. Two three toes on the PP 9 PP 9 extermities below the knee. 
bilateralabsence toes onthe feet _left and two toes below the knees. 
of halluces with bilaterally. on the right, pes 
three toes on the equinovarus. 
right foot (digit | 
and digit II 
missing). 
Dysmorphic right 
ear, 
Bilateral cleft lip- hypertelorism, 
palate, bilateral bilateral cleft lip, Severe mirco- 
lung agenesis, complete cleft retrognathia, 
bilateral : Bilateral cleft lip- palate and severe Ultrasound unilateral cleft lip 
a palpebral fusion, micrognathia. feat (left), posterior 
Hypoplastic fi ‘i palate, examination: 
micrognathia, ‘ - cleft palate, 
scapulae, a microretrognatism. : 
. unfused " Short frenulum glossoptosis. 
rudimentary Severely z Bilateral cleft lip- * Severe 
j : maxillary 5 F with the tongue - e 
triangular bone on hypoplastic pelvis a palate, bilateral | Short frenulum with micrognathia. 
Cleft lip-palate, processes, 4 tethered to the Complete 
the left, structure. lung agenesis, | the tongue tethered és 
Others nla ‘ nla severe Fi floor of the mouth : agenesis of both 
diaghragmatic Prominent ts hyperecogenic to the floor of the Heart fills up ; 

a F no autopsy. mandibular 4 (ankyloglossia). lungs and blind 
hernia on glabella, mild h lasi focus at the right mouth most of the di ; 
antenatal retrognathia Teepe. ventricle. (ankyloglossia). chest. engin malty 

ultrasound at the a absence of . Complete 7 bronchi. 
27th week. nipples, Bilateral lung agenesis of both Stomach bubble Agenesis of 
small penis, . lungs and blind branches of 
agenesis ; 5 seen. 
absence of ending main pulmonary atery. 


scrotum (testes 
intra-abdominal), 
heart defects. 


bronchi. Agenesis 
of branches of 
pulmonary atery. 
Hypoplastic 
pulmonary veins. 


Hypoplastic 


pulmonary veins. 


n/a, not applicab 


@ 
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Extended Data Table 2 | List of primers 


a_Human genotyping primers 


RSPO2 -exon2-F 5’-CGGCTCGTGCTAGGCAGT-3’ 
RSPO2-exon2-R 5'-CAGGGTCCTAAAGGTGGGGA-3’ 
RSPO2-exon3-F 5'-GACATCCCCATGAGCCA-3’ 
RSPO2-exon3-R 5’-CACCCAGCAAGCTTAAACT-3’ 
RSPO2-exon4-F 5’-GTTGAAAGAGACAGGGATGACT-3’ 
RSPO2-exon4-R 5'-CAGTTCTACTGAACAAGAGAACCA-3’ 
RSPO2-exon5-F 5'-GGTCTTCAAGGCTGTACCACT-3’ 
RSPO2-exon5-R 5’-GAAGCACACAGCACACAGT-3’ 
RSPO2-exon6-F 5’-GGTGATGTTTTCCAGATGGGCT-3’ 
RSPO2-exon6-R 5'-CTGGGAACAGATACTGGGCA-3’ 


b Mouse genotyping primers 


Lgr4 -WT-F 5'-TGCAACCCTAGAAGGGAAAA-3’ 
Lgr4 -WT-R 5'-CTCACAGTGCTTGGGTGAAG-3’ 
Lgr4 -null-F 5'-GCCTGCATTACCGGTCGATGCAACGA-3’ 
Lgr4 -null-R 5'-CTCACAGTGCTTGGGTGAAG-3’ 
Lgr5 -WT-F 5'-ACATGCTCCTGTCCTTGCT-3’ 
Lgr5-WT-R 5'-GTAGGAGGTGAAGACGCTGA-3’ 
Lgr5 -null-F 5'-CACTGCATTCTAGTTGTGG-3’ 
Lgr5 -null-R 5'-CGGTGCCCGCAGCGAG-3’ 

Lgr6 -WT-F 5'-CGCTCGCCCGTCTGAGC-3’ 
Lgr6-WT-R 5'-GCGTCCAGGTCCGCAGG-3’ 
Lgr6 -null-F 5'-CGCTCGCCCGTCTGAGC-3’ 
Lgr6 -null-R 5'-CCTGGACGTAGCCTTCGGGC-3’ 


c Mouse qPCR primers 


Lgr4 -qPCR-F 5'-GCCTTCACCCAAGCACTG-3’ 

Lgr4 -qgPCR-R 5'-CAGCCAGTTGTAGCTCCTCT-3’ 
Lgr5 -qPCR-F 5'-ACAACCCCATCCAATTTGTTG-3’ 
Lgr5-qPCR-R 5'-CGAGGCACCATTCAAAGTCA -3’ 
Lgr6 -qPCR-F 5'-CCCTGACTATGCCTTCCAGA-3’ 
Lgr6-qPCR-R 5'-ATGCTGGATGCGGTTGTTAT-3’ 
Rnf43 -qPCR-F 5'-GGCCTATGTGTGGATTGAGC-3’ 
Rnf43-qPCR-R 5'-TGAGGCCAGGATGATCACAA-3’ 
Znrf3 -qPCR-F 5'-CATCCGACTGTGCCATCTGT-3’ 
Znrf3-qPCR-R 5'-GCCATGGATCCACACACTTC-3’ 
Axin2 -qPCR-F 5'-GAGTGGACTTGTGCCGACTTCA-3’ 
Axin2 -qPCR-R 5'-GGTGGCTGGTGCAAAGACATAG-3’ 
Rspo1 -qPCR-F 5'-ATACTTTGATGCCCGCAACC-3’ 
Rspo1 -qPCR-R 5'-CTCACAGTGCTCGATCTTGC-3’ 
Rspo2 -qPCR-F 5'-CGAGCCCCAGATATGAACAG-3’ 
Rspo2-qPCR-R 5'-AAAAGCCTACTTTGCACTTCG-3’ 
Rspo3 -qPCR-F 5'-TGTGTCTCTCTTCGTGTCCA-3’ 
Rspo3-qPCR-R 5'-AGGTATCACAGTCAACTTTGCA-3’ 
Rspo4 -qgPCR-F 5'-GGACATGCTCGCCCTGTA-3’ 
Rspo4 -qgPCR-R 5'-GAACAGCCATTCTCCTCCGA-3’ 
Actin -qPCR-F 5'-AAGGCCAACCGTGAAAAGAT-3’ 
Actin-qgPCR-R 5'-GTGGTACGACCAGAGGCATAC-3’ 


d Target sequences for Xenopus tropicalis rnf43 and znrf3 TALENs, and rspo2 gRNA 


Xt_mf43-TALEN-ELD 5'-TGCTCACGGTGACTCTC-3' 
Xt_mf43-TALEN-KKR 5'-CCATGGGCACCACGGAA-3' 
Xt_znrf3 -TALEN-ELD 5'-TTTTTCGTGGTGGTGTC-3' 

Xt_znrf3 -TALEN-KKR 5'-CTCCTTATCAAGATCAA-3' 
Xt_rspo2-gRNA 5'-TGACTCCATAGTATCCAGGAGGG-3' 


e PCR primers for Xenopus tropicalis in situ probes 


Xt_znrf3 -insitu-F 5'-AATTAACCCTCACTAAAGGGGCTGTGATATTTGATGTGTCTG-3’ 
Xt_znrf3 -insitu-R 5'-TAATACGACTCACTATAGGGACTTCCACCAACCTCCTG-3’ 
Xt_rnf43 -insitu-F 5'-AATTAACCCTCACTAAAGGGGGCTTCATTTCCATTGTCAAACTG-3’ 
Xt_rnf43 -insitu-R 5'-TAATACGACTCACTATAGGGTCCTGCCCATCTGTGAACTC-3’ 


f Xenopus tropicalis genotyping primers 


Xt_mt43-F 5'-CCACACCCCAACAAAATCA-3’ 
Xt_rnf43-R 5'-CCACACCCCAACAAAATCA-3’ 
Xt_znrp3-F 5'-ACAGCATGCCTTCCCTACAC-3’ 
Xt_znrf3-R 5'-GTAGGTTGCTGCCAAATCTCAC-3’ 
Xt_rspo2-F 5'-GTCGTGTTGAAATGGTGCGG-3’ 
Xt_rspo2-R 5'-GTTCCTTGACAAGTATCCAAGCTG-3’ 


a, Genotyping primers for human RSPO2. b, Genotyping primers for mouse Lgr4, Lgr5 and Lgr6 wild-type and knock-in alleles. c, Mouse qPCR primers used in this study. d, Target sequences for X. 
tropicalis rnf43 and znrf3 TALENs, and rspo2 gRNA. e, PCR primers for X. tropicalis in situ probes via cloning-free methodology. f, Genotyping primers for X. tropicalis rnf43 and znrf3 TALENs, and rspo2 
CRISPR-Cas9 target sites. 
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Mxra8 is a receptor for multiple arthritogenic 


alphaviruses 
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Arthritogenic alphaviruses comprise a group of enveloped RNA 
viruses that are transmitted to humans by mosquitoes and cause 
debilitating acute and chronic musculoskeletal disease!. The host 
factors required for alphavirus entry remain poorly characterized’. 
Here we use a genome-wide CRISPR-Cas9-based screen to identify 
the cell adhesion molecule Mxra8 as an entry mediator for multiple 
emerging arthritogenic alphaviruses, including chikungunya, Ross 
River, Mayaro and O’nyong nyong viruses. Gene editing of mouse 
Mxra8 or human MXRA8& resulted in reduced levels of viral infection 
of cells and, reciprocally, ectopic expression of these genes resulted 
in increased infection. Mxra8 bound directly to chikungunya virus 
particles and enhanced virus attachment and internalization into 
cells. Consistent with these findings, Mxra8-Fc fusion protein or 
anti-Mxra8 monoclonal antibodies blocked chikungunya virus 
infection in multiple cell types, including primary human synovial 
fibroblasts, osteoblasts, chondrocytes and skeletal muscle cells. 
Mutagenesis experiments suggest that Mxra8 binds to a surface- 
exposed region across the A and B domains of chikungunya virus 
E2 protein, which are a speculated site of attachment. Finally, 
administration of the Mxra8-Fc protein or anti-Mxra8 blocking 
antibodies to mice reduced chikungunya and O’nyong nyong virus 
infection as well as associated foot swelling. Pharmacological 
targeting of Mxra8 could form a strategy for mitigating infection 
and disease by multiple arthritogenic alphaviruses. 

We performed a genome-wide screen for host factors required for chi- 
kungunya virus (CHIKV) infection, using the CRISPR-Cas9 system?** 
and lentiviruses delivering single-guide RNAs (sgRNA) targeting 
20,611 mouse genes (Extended Data Fig. 1a). We inoculated lenti- 
virus-transduced 3T3 mouse fibroblasts with CHIKV strain 181/25 
that contained the mKate2 reporter (CHIKV-181/25-mKate2), such 
that almost all cells expressed the reporter gene by 24h. The few cells 
that lacked mKate2 expression were sorted, propagated in the pres- 
ence of neutralizing anti-CHIKV monoclonal antibodies (mAbs)° 
and then re-inoculated with CHIKV-181/25-mKate2. After two 
rounds of infection and sorting, genomic DNA from mKate2-negative 
cells was collected, sgRNAs were sequenced and then analysed using 
MAGeCK° (Supplementary Tables 1, 2). The top candidate was Mxra8 
(also known as DICAM, ASP3 or limitrin), an adhesion molecule found 
in mammals, birds and amphibians (Extended Data Fig. 1b, c) that is 
expressed on epithelial, myeloid and mesenchymal cells’~'° and shares 
homology with junctional adhesion molecule’, a reovirus entry recep- 
tor''. We validated Mxra8 using three different ssRNAs in bulk 3T3 
cells, by generating AMxra8 single-cell clones in 3T3 and MEF cells, 
and by confirming gene deletion and cell viability (Extended Data 
Fig. 2a—e). Infection of CHIKV-181/25 was reduced in AMxra8 cells, 
and trans-complementation of Mxra8 in AMxra8 3T3 cells restored 
infectivity (Fig. la, b). As CHIKV-181/25 is a cell-culture-adapted 


vaccine strain” that has acquired heparan sulfate binding activity’, 
we evaluated Mxra8 with other CHIKV strains. Infection of CHIKV- 
AF15561—the parental Asian strain of CHIKV-181/25, which binds 
poorly to heparan sulfate't—and CHIKV-37997, a West African geno- 
type strain, was abolished in AMxra8 3T3 cells, reduced in AMxra8 
MEF cells (Fig. 1a) and restored in trans-complemented AMxra8 3T3 
cells (Fig. 1b, c). However, the dependence on Mxra8 was less with 
CHIKV-LR 2006, a strain of the East/Central/South African genotype 
(Fig. 1a, d). To confirm that CHIKV required Mxra8 independently 
of heparan sulfate binding, we expressed murine Mxra8 in parental or 
glycosaminoglycan-deficient Chinese hamster ovary cells!° (Extended 
Data Fig. 3a). Expression of Mxra8 enhanced infectivity of CHIKV 
regardless of whether Chinese hamster ovary cells expressed heparan 
sulfate or other glycosaminoglycans (Extended Data Fig. 3b, c). 

We tested the requirement of Mxra8 for infection by other alphavi- 
ruses. Whereas Mayaro, Ross River, O’nyong nyong and Barmah Forest 
arthritogenic alphaviruses showed reduced infection in AMxra8 3T3 
cells, Semliki Forest and Getah viruses had partial phenotypes, and 
other related alphaviruses (Sindbis, Bebaru, Una and Middleburg 
viruses) showed little dependence on Mxra8 (Fig. le and Extended 
Data Fig. 4). Minimal differences in infection were observed between 
control and AMxra8 3T3 cells with chimaeric Sindbis viruses express- 
ing the structural genes of the Eastern or Western equine encephalitis 
alphaviruses, or a Venezuelan equine encephalitis virus that expressed 
GFP (Fig. le). No effect of Mxra8 was seen on infection of unrelated 
positive- or negative-sense RNA viruses (Fig. 1f). We next assessed 
whether any of the four isoforms of the human MXRA& orthologue 
(Extended Data Fig. 1b) served a similar function. As HeLa cells did 
not express MXRA8 (Extended Data Fig. 5), we used these cells for 
ectopic expression (Extended Data Fig. 6a). MXRA8-1, MXRA8-2 and 
MXRA8-4—but not MXRA8-3—were detected on the cell surface, 
and MXRA8-1 and MXRA8-2 enhanced CHIKV infectivity (Fig. 1g). 
Similarly, expression of MXRA8-2 in A549 or 293T cells resulted in 
greater CHIKV infection (Extended Data Fig. 6b, c). Consistent with 
this observation, expression of different sgRNAs targeting all isoforms 
of MXRA8 in MRC-5 human lung fibroblasts, HFF-1 foreskin fibro- 
blasts, RPE retinal pigment epithelial cells and Hs 633T fibrosarcoma 
cells resulted in less CHIKV infection than in control gene-edited cells 
(Fig. 1h and Extended Data Fig. 7a, b). 

To determine whether Mxra8 is required for replication, we trans- 
fected CHIKV genomic RNA into control and AMxra8 MEF cells in 
the presence of NH,C] to inhibit virus maturation and further rounds 
of infection. As no difference in CHIKV gene expression was detected 
(Fig. 2a), Mxra8 does not appear to affect translation or replication. We 
demonstrated an Mxra8-dependence of the structural proteins using 
pseudotyped viruses. Whereas infection of CHIKV pseudotyped viri- 
ons that encapsidated a murine leukaemia virus GFP-reporter RNA 
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Fig. 1 | Mxra8 is required for optimal infection of CHIKV and other 
alphaviruses. a, AMxra8 or control 3T3 or MEF cells were inoculated 
with CHIKV and stained for E2 protein (3 experiments, n = 9; two-tailed 
t-test with Holm-Sidak correction; mean + s.d.). b-d, Multi-step growth 
curves with CHIKV-181/25 (b), CHIKV-AF15561 (c) or CHIKV-LR-2006 
(d) in control, AMxra8 or Mxra8 trans-complemented AMxra8 3T3 cells 
(3 experiments, n = 9; mean +s.d.). FFU, focus forming units. 

e, AMxra8 or control 3T3 cells were inoculated with alphaviruses and 
processed for E2 or reporter gene expression (3 or more experiments, 

n=6 except for Semliki Forest virus (SFV), Sindbis-Western equine 
encephalitis virus chimaera (WEEV) and Sindbis—Eastern equine 
encephalitis virus chimaera (EEEV) in which n = 18; two-tailed t-test 
with Holm-Sidak correction; mean +s.d.). f, AMxra8 or control 3T3 cells 


was reduced in AMxra8 compared to control MEF cells, infection of 
Eastern or Western equine encephalitis virus pseudotyped virions was 
not (Fig. 2b). Consistent with a role for Mxra8 in the entry pathway, at 
4°C CHIKV-AF15561 showed reduced binding to AMxra8 compared 
to control MEF cells, and increased binding to cells overexpressing 
Mxra8 (Fig. 2c, d). When virus internalization assays were performed 
at 37°C, less CHIKV RNA was measured within AMxra8 MEF cells, 
and more CHIKV RNA was detected in cells overexpressing Mxra8 
(Fig. 2c). 

To corroborate an effect of Mxra8 on binding and entry, we gen- 
erated Fc fusion proteins with the extracellular domains of mouse 
Mxra8 (Mxra8-Fc) or human MXRA8-2 (MXRA8-2-Fc), along with 
a control osteoprotegerin protein (OPG-Fc) (Extended Data Fig. 8a). 
Pre-incubation with Mxra8—Fc or MXRA8-2-Fc, but not the control 
OPG-Fc, reduced CHIKV- 181/25 infection in 3T3 (Fig. 2e) and MRC-5 
(Fig. 2f) cells. We tested a panel of hamster mAbs against mouse Mxra8 
(Extended Data Fig. 8b) for their capacity to inhibit CHIKV infection. 
Seven mAbs bound to mouse Mxra8-Fc, with four also recognizing 
human MXRA8-2-Fc (Extended Data Fig. 8c). Pre-treatment of 3T3 
cells with anti-Mxra8 mAbs reduced CHIKV infection (Fig. 2g). Three 
of the mAbs that bound human MXRA8-2-Fc also reduced infection 
in MRC-5 cells (Extended Data Fig. 8d). To establish the importance 
of the Mxra8 ectodomain, we engineered forms with glycophosphati- 
dylinositol anchors (Mxra8-GPI) or lacking a cytoplasmic domain 
(Mxra8-AC-tail) for trans-complementation (Extended Data Fig. 9). 
Notably, Mxra8-GPI and Mxra8-AC-tail restored CHIKV infection 
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were inoculated with indicated viruses and processed for viral antigen 

or reporter gene expression (3 experiments, mean + s.d.). g, HeLa cells 
were transduced with control or MXRA8-1, MXRA8-2, MXRA8-3 or 
MXRA8-4 alleles, inoculated with CHIKV and processed for E2 staining 
(3 experiments, n = 6; one-way ANOVA with Dunnett’s test; mean + s.d.). 
h, Human MRC-5 cells depleted of MXRA8 with two different sgRNA 
were inoculated with CHIKV, and E2 expression was analysed (3 
experiments, n = 9; one-way ANOVA with Dunnett’s test; mean + s.d.). 
EMCY, encephalomyocarditis virus; MAY V; Mayaro virus; ONNV, 
O’nyong nyong virus; RRV, Ross River virus; RVFV, Rift Valley fever virus; 
VEEV, Venezuelan equine encephalitis virus; VSV, vesicular stomatitis 
virus; WNV, West Nile virus. *P < 0.05; **P < 0.01; ***P< 0.001; 
P< (0.0001; NS, not significant. 


(Fig. 2h). Interaction with the Mxra8 ectodomain may facilitate viral 
glycoprotein conformational changes that are required for internal- 
ization or fusion’® or potentiate interactions with other host factors 
that bridge membrane penetration and entry’”. Finally, heterologous 
expression of human MXRA8-2 in AMxra8 mouse cells also restored 
CHIKV infection (Fig. 2h). 

To determine whether Mxra8 directly binds to CHIKV, we captured viri- 
ons or virus-like particles!® with a human anti-CHIKV mAb”, and added 
Mxra8—-Fc or MXRA8-2-Fc in an enzyme-linked immunosorbent assay. Both 
Mxra8-Fcand MXRA8-2-Fc, but not OPG-Fc, bound to CHIKV virions and 
virus-like particles (Fig. 3a, b). By comparison, Mxra8-Fc and MXRA8-2-Fc 
did not bind efficiently to Eastern equine encephalitis virus particles derived 
from a chimaera with Sindbis virus (Fig. 3c). Ina complementary assay, we 
assessed binding of Mxra8-Fc protein to cell-surface-displayed alphavirus 
proteins on infected cells!°#°, Mxra8—Fc bound cells infected with CHIKV, 
Ov’nyong nyong virus, Mayaro virus and Ross River virus, but not cells infected 
with Sindbis virus or Venezuelan equine encephalitis virus (Extended Data 
Fig. 10a, b). We also analysed binding of Mxra8 to CHIKV virus-like particles 
by surface plasmon resonance, and found a slow association rate, along half-life 
and an affinity of about 200nM (Fig. 3d). 

We evaluated whether human anti-CHIKV mAbs that bound 
epitopes within the E2 protein’? altered Mxra8-Fc binding to CHIKV. 
Several mAbs recognizing epitopes in the A domain (2H1, 8G18, 
3E23 and 106) and mAbs recognizing shared epitopes in the A and 
B domains (1H12 and 4J14) inhibited binding, whereas other mAbs 
that localize to distinct sites had less effect (Fig. 3e). We then tested 
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Fig. 2 | Mxra8 modulates CHIKV attachment and internalization. 
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Fig. 4 | Mxra8 contributes to alphavirus pathogenesis. a, Surface 
expression of MXRA8 on primary human keratinocytes, dermal 
fibroblasts, synovial fibroblasts, osteoblasts, chondrocytes and skeletal 
muscle cells. One experiment of three is shown. b, Primary human cells 
were pre-incubated with anti- MXRA8 blocking mAbs before addition of 
CHIKV-AF15561 and processed for E2 staining (3 experiments, n= 9; 
one-way ANOVA with Dunnett’s test). c, d, Mxra8-Fc or JEV-13 mAb 
were incubated with CHIKV-AF15561 for 30 min before subcutaneous 
inoculation. c, At 12, 24 and 72h, CHIKV was measured in the 
ipsilateral ankle and calf muscle. d, At 72h, foot swelling was measured 
(2 experiments, n = 10; median viral titres: two-tailed Mann-Whitney 
test; mean foot swelling: two-tailed unpaired t-test). e, Mxra8-Fc or 
JEV-13 mAb was mixed with O’nyong nyong virus immediately before 
subcutaneous inoculation. At 12h, O’nyong nyong virus was measured 


the binding of Mxra8-Fc to an alanine scanning mutagenesis library 
of E2 A and B domains in the context of display on the surface of 
293T cells!®*°. Residues W64, D71, T116 and I121 in the A domain, 
and 1190, Y199 and 1217 in the B domain, of E2 emerged as essential 
for optimal Mxra8-Fc binding (Fig. 3f and Supplementary Table 3) 
and overlap the binding sites of the blocking mAbs that we tested”. 
Mapping of these residues onto the p62 (E2 precursor)-E1 hetero- 
dimer or the trimer of heterodimers (Extended Data Fig. 10c, d) on 
the virion surface”! revealed a solvent-accessible epitope across the 
top of the A and B domains, which is a proposed site of alphavirus 
receptor engagement”? : 

To begin to assess the physiological importance of MXRA8 inter- 
action with CHIKV, we tested surface expression of MXRA8 on pri- 
mary human keratinocytes, dermal fibroblasts, synovial fibroblasts, 
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in the ipsilateral ankle (2 experiments, n = 10; two-tailed unpaired f-test; 
median values). f, g, Mxra8—Fc or JEV-13 mAb was administered via 

an intraperitoneal route 6h before CHIKV-AF15561 inoculation in the 
footpad. At 24h, CHIKV was measured in the ipsilateral ankle (f). At 72h, 
foot swelling was measured (g) (2 experiments, n = 10; median viral titres: 
two-tailed Mann-Whitney test; mean foot swelling: two-tailed unpaired 
t-test). h-j, Pairs of anti-Mxra8 mAbs or isotype control hamster mAbs 
were administered via an intraperitoneal route 12h before (h, i) or 8 or 
24h after (j) inoculation of CHIKV-AF15561. At 12 (h) and 72 

(h, j) h, CHIKV was measured. At 72h, foot swelling (i) was measured 

(2 experiments; h left, i: n = 10; one-way ANOVA with Dunnett's test; 

h middle and right: n = 10; Kruskal-Wallis with Dunn’s test; j: n = 8; 
two-tailed Mann-Whitney test). Ipsilat, ipsilateral; contra, contralateral. 
*P< 0.05; **P< 0.01; ***, P< 0.001; ****P < 0.0001. 


osteoblasts, chondrocytes and skeletal muscle cells (Fig. 4a), all of 
which are targets of infection by alphaviruses™. Pretreatment with 
anti- MXRA8 blocking mAbs inhibited infection of CHIKV-AF15561 
in all cells but keratinocytes, which lack MXRA8 expression (Fig. 4a, b). 
We next evaluated whether co-injection of Mxra8-Fc with 
CHIKV-AF15561 would diminish infection in C57BL/6 mice. The 
addition of Mxra8-Fc diminished CHIKV infection in the ipsilat- 
eral ankle and muscle (Fig. 4c) and reduced foot swelling (Fig. 4d) 
compared to a control mAb. Treatment with Mxra8-Fc also inhib- 
ited Onyong nyong virus infection in the ipsilateral ankle of mice 
(Fig. 4e). We next administered Mxra8-Fc via an intraperitoneal 
route, and 6h later we inoculated CHIKV in the footpad. Mxra8-Fc 
treatment reduced foot swelling and viral burden in the ipsilateral 
ankle (Fig. 4f, g), although the phenotype was less pronounced than 
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in co-injection experiments. To extend these findings, we transferred 
hamster anti-Mxra8 blocking or control mAbs to mice via an intra- 
peritoneal route 12h before CHIKV inoculation. Reduced CHIKV 
titres were observed in the ipsilateral ankle and calf muscle, and 
contralateral ankle at 12 and 72h after infection in anti-Mxra8 com- 
pared to control mAb-treated mice (Fig. 4h). Treatment with anti- 
Mxra8 mAbs also reduced foot swelling (Fig. 4i). In post-exposure 
therapeutic experiments, we observed reduced CHIKV infection in the 
contralateral ankle and muscle when anti-Mxra8 mAbs were admin- 
istered at 8 or 24h after virus inoculation (Fig. 4j and Extended Data 
Fig. 8e). These in vivo experiments establish a function for Mxra8 in 
the pathogenesis of infection of arthritogenic alphaviruses. 

Our studies establish that mouse Mxra8 contributes to CHIKV entry 
and is required for infection and disease. Human MXRA8 also bound 
CHIKV and supported infection, and MXRA8&8 expression in primary 
human cells overlapped with the tropism of CHIKV in vivo. Infection of 
several arthritogenic alphaviruses—including CHIKV, O’nyong nyong 
virus, Mayaro virus and Ross River virus—was reduced in AMxra8 
cells, which suggests that Mxra8 may serve as a shared receptor. Our 
data contrast with those relating to natural resistance-associated macro- 
phage protein (NRAMP2), which is an entry receptor for Sindbis virus 
but not for CHIKV or Ross River virus”. Nonetheless, residual CHIKV 
infection in the absence of Mxra8 in cells and in mice, and the absence 
of an apparent mosquito orthologue, suggests that additional uniden- 
tified host factors contribute to cell binding and entry. 

Our mutagenesis mapping studies suggest that amino acids in the E2 
A and B domains contribute to the interaction of CHIKV with Mxra8. 
Higher-resolution structural experiments are needed to define the com- 
plete footprint of binding between Mxra8 and CHIKV E2 protein. Such 
studies could facilitate the development of small molecules or biological 
agents that disrupt Mxra8 interaction with E2 protein, which could 
form the basis of therapeutic strategies for the amelioration of diseases 
caused by multiple emerging alphaviruses. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
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METHODS 

Cells and viruses. Vero, NIH-3T3, MEK HEK 293T, A549, HeLa (ATCC #CCL-2), 
MRC-5 (provided by D. Wang, Washington University), HFF-1 (ATCC #SCRC- 
1041), Hs 633T (Sigma-Aldrich #89050201), Huh7, RPE (provided by M. Mahjoub, 
Washington University), JEG3 (provided by I. Mysorekar, Washington University), 
U20S (provided by S. Cherry, University of Pennsylvania), HT 1080 (provided 
by J. Cooper, Washington University), Raji and K562 cells all were cultured at 
37°C in Dulbecco's Modified Eagle medium supplemented with 10% fetal bovine 
serum (FBS), 10 mM HEPES, 1mM sodium pyruvate, 1 x non-essential amino 
acids, and 100 U/ml of penicillin-streptomycin. HTR8/SV.neo cells (provided by 
I. Mysorekar, Washington University) were cultured in RPMI 1640 supplemented 
with 5% FBS and 1% penicillin and streptomycin. Jurkat cells were cultured at 
37°C in RPMI 1640 supplemented with 10% FBS and 10mM HEPES. hCMEC/ 
D3 cells (provided by R. Klein, Washington University) were cultured in EBM-2 
medium (Lonza, 00190860) supplemented with 5% FBS, 5 1g/ml ascorbic acid, 
10mM HEPES, 1% lipid concentrate (Gibco, 11905-031) in plates pre-coated with 
collagen. All cell lines were tested and found to be free of mycoplasma contamina- 
tion using a commercial kit. Cell lines were not authenticated. 

Primary human keratinocytes (#102-05n), synovial fibroblasts (#408-05a), oste- 
oblasts (#406-05f), chondrocytes (#402-05f) and skeletal muscle cells (#$150-05f) 
were purchased from Cell Applications. Primary human dermal fibroblasts (#CC- 
2509) were obtained from Lonza. Cells were thawed and cultured in specified 
medium according to the instructions of the manufacturers, and used within one 
week. 

The following alphaviruses were used: CHIKV (strains 2006 La Reunion OPY1, 

37997, AF15561, 181/25, and 181/25-mKate2 (rescued from pJM6 CHIKV-181/25 
mKate2 cDNA clone, provided by T. Morrison and M. Heise), RRV (T48), MAYV 
(BeH407), ONNV (MP30), SFV (Kumba), SINV (Toto1101, Girdwood), Bebaru 
virus (BEBV, MM 2354), Middleburg virus (MIDV, 30037), Getah virus (GETV, 
AMM-2021), Una virus (UNAV, CoAr2380) and Barmah Forest virus (BRV, 
K10521). Additional viruses tested included chimaeric encephalitic alphaviruses 
(SINV-EEEV and SINV-WEEV), VEEV-GFP (TC-83)), a flavivirus (WNV, 
New York 2000), a bunyavirus (RVFV-GEP, MP-12), a rhabdovirus (VSV-GFP, 
Indiana) and a picornavirus (encephalomyocarditis virus, EMCV). Replication- 
competent SINV chimaeric viruses were constructed by replacing the SINV TR339 
structural protein genes with EEEV FL93-939 or WEEV Fleming structural pro- 
tein genes under control of the SINV subgenomic promoter in the TR339 cDNA 
clone?’. All viruses were propagated in Vero cells and titrated by standard plaque 
or focus-forming assays”*, 
Pooled sgRNA screen and data analysis. A GECKOv2 CRISPR knockout pooled 
library encompassing 130,209 different ssRNAs against 20,611 mouse genes”’ was 
made available by FE. Zhang (Addgene #1000000053), and amplified in Endura cells 
(Lucigen # 60242) as previously described”**°, The sgRNA library was divided in 
half (A + B), packaged into lentiviruses and used for independent screening. The 
sgRNA plasmid library was packaged in 293FT cells after co-transfection with 
psPAX2 (Addgene #12260) and pMD2.G (Addgene #12259) at a ratio of 2:2:1 
using FugeneHD (Promega). Approximately 48 h after transfection, supernatants 
were collected, clarified by centrifugation (3,500 r.p.m. x20 min) and aliquotted 
for storage at —80°C. 

For the CRISPR screen, a clonal 3T3-Cas9 cell line was generated by trans- 
duction with a packaged lentivirus (lentiCas9-Blast, Addgene #52962), blas- 
ticidin selection and limiting dilution. 3T3-Cas9 cells were expanded and 
transduced with CRISPR sgRNA lentivirus library at a multiplicity of infection 
(MOI) of 0.3 by spinoculation (1,000g) at 32 °C for 30 min in 12-well plates. 
After selection with puromycin for 7 days, ~1 x 10° cells were inoculated with 
CHIKV-181/25-mKate2 (MOI of 1) and then incubated for 24h to allow nearly 
all cells to become infected. Cells were sorted for an absence of mKate2 expres- 
sion using a Sony Biotechnology Synergy SY3200 Cell sorter (Siteman Flow 
Cytometry Core, Washington University). To enrich for the cell population 
that was resistant to CHIKV infection and increase the signal-to-noise ratio, 
the mKate2-negative cells were expanded in culture. Given the rapid replica- 
tion rate and cytopathic effect of CHIKV, two humanized neutralizing mAbs’, 
CHK-152 and CHK-166 (2 1g/ml of each), were added to block infection by 
any residual virus. The expanded cells were re-infected with CHIKV-181/25- 
mKate2 in the absence of mAbs, sorted for mKate2 negative cells and the pro- 
cedure was repeated for one additional round. 

Genomic DNA was extracted from the uninfected cells (5 x 107) or the 
mKate2-negative sorted cells (1 x 10’), and sgRNA sequences were amplified?! 
and subjected to next generation sequencing using an Illumina HiSeq 2500 
platform (Genome Technology Access Center, Washington University). The 
sgRNA sequences against specific genes were determined after removal of the tag 
sequences using the FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) and 
cutadapt 1.8.1. sgRNA sequences were analysed using a published computational 
tool (MAGeCK)° (see Supplementary Tables 1, 2). 


Gene validation. Mxra8 was validated by using three independent sgRNAs as 
follows: Mxra8 sgRNA1, 5'-CTTGTGGATATGTATTCGGC-3’; Mxra8 sgRNA2, 
5'-TGTGCGCCTCGAGGTTACAG-3’; Mxra8 sgRNA3, 5‘-GCTGCATG 
ATCGCCAGCGCG-3’. The sgRNAs were cloned into the plasmid lentiCRISPR v.2 
(Addgene #52961) and packaged with a lentivirus express system. 3T3 or MEF cells 
were transduced with lentiviruses expressing individual sgRNA and selected with 
puromycin for seven days before infection with different viruses. For some valida- 
tion experiments, clonal cells edited by sgRNA1 were isolated by limiting dilution. 
To validate the human orthologue MXRA8&, two different sgRNAs targeting all four 
isoforms were used: MXRA8 sgRNAI, 5'/-GGCGCGGATGCCTTTGAGCG-3’; 
MXRA8 sgRNA2, 5‘-GTCCGCCTGGAGGTCACCGA-3’. The sgRNAs were 
cloned similarly as above, and the gene-edited bulk cells were used for validation 
studies. 

For flow cytometric analyses, gene-edited 3T3 cells were inoculated with differ- 
ent viruses as follows: CHIKV-181/25 (MOI of 3, 9.5h), CHIKV-AF15561 (MOI 
of 10, 24h), CHIKV-37997 (MOI of 3, 10h), CHIKV-LR 2006 (MOI of 1, 9.5h), 
ONNV (MOI of 3, 12h), RRV (MOI of 3, 32h), MAYV (MOI of 3, 24h), SFV (MOI 
of 1, 9h), SINV (MOI of 10, 6h), SIN-WEEV (MOI of 10, 10h), SIN-EEEV (MOI 
of 10, 10h), VEEV-GFP (MOI of 3, 6.5h), RVFV-GFP (MOI of 10, 8h), VSV-GEP 
(MOI of 3, 6h), WNV (MOI of 10, 25h) and EMCV (MOI of 3, 6h). Gene-edited 
MEF cells were inoculated with CHIKV-181/25 (MOI of 3, 8h), CHIKV-AF15561 
(MOI of 10, 10h), CHIKV-37997 (MOI of 1, 10h) and CHIKV-LR 2006 (MOI of 
1, 8h). Gene-edited MRC-5, RPE and HFF-1 cells were inoculated with CHIKV- 
181/25 (MOI of 10, 10h), CHIKV-AF15561 (MOI of 10, 10h) and CHIKV-LR 2006 
(MOI of 1, 10h). Gene-edited Hs 633 T cells were inoculated with CHIKV-181/25 
(MOI of 15), CHIKV-AF15561 (MOI of 15) and CHIKV-LR 2006 (MOI of 3) for 
11.5h. At the indicated times, cells were collected with trypsin, and fixed with 1% 
paraformaldehyde (PFA) diluted in PBS for 15 min at room temperature and per- 
meabilized with Perm buffer (Hank’s Balanced Salt Solution (HBSS) supplemented 
with 10mM HEPES, 0.1% (w/v) saponin, and 2% FBS) for 10 min at room tempera- 
ture. Cells then were incubated for 30 min at room temperature with 1 j1g/ml of the 
following virus-specific antibodies: CHIKV (mouse CHK-115), ONNV and MAYV 
(mouse CHK-48), RRV (human 119); SINV (ascites fluid, ATCC VR-1248AF), SFV 
(mouse CHK-124), VEEV (mouse 1A4A), WEEV (mouse 11A1), EEEV (mouse 
EEEV-10), WNV (human E16”) and EMCV (mouse serum). After washing, cells 
were incubated with 21g/ml of Alexa Fluor 647-conjugated goat anti-mouse or 
anti-human IgG (Invitrogen) for 30 min at room temperature. Cells were processed 
ona MACSQuant Analyzer 10 (Miltenyi Biotec), and analysed using FlowJo soft- 
ware (Tree Star). 

Validation also was performed by an infectious virus yield assay. Gene-edited 
3T3 cells were plated 12h before infection. Cells were inoculated with CHIKV- 
181/25, CHIKV AF15561, CHIKV-LR 2006 at an MOI of 0.01 or other alphavi- 
ruses (BEBV, MOI 0.001; BFV, GETV, UNAV, MIDV, and SFV, all at MOI of 0.01) 
for 1h, then washed once and maintained in reduced 2% FBS culture medium. 
Supernatants were collected at specific times after infection for titration on Vero 
cells by focus-forming assay (CHIKV) or standard plaque assay (all other viruses). 
Genomic RNA transfection and analysis. To assess for effects of Mxra8 on 
CHIKV replication, we transfected capped viral genomic RNA into MEF cells. 
Capped genomic RNA was generated using an mMESSAGE mMACHINE SP6 
Transcription Kit according to the manufacturer’s instructions (Thermo Fisher 
#AM1340) from the NotI-linearized CHIKV-181/25 cDNA clone. One microgram 
of purified RNA was transfected into control or AMxra8 cells using the Neon 
transfection system according to the manufacturer's instructions (Thermo Fisher 
Scientific). Cells were then incubated in 15 mM NH,Cl to prevent subsequent 
rounds of infection. At specified times, cells were collected with trypsin and pro- 
cessed for E2 expression levels by flow cytometry. 

Pseudotyped virus experiments. MLV-GFP pseudoviruses were made as 
described***4 except plasmids encoding structural proteins of CHIKV (strain 
37997), VEEV (strain TrD) and EEEV (strain FL91-4697) were used. Pseudovirus 
entry in 3T3 cells expressing or lacking Mxra8 was assessed 36h later by measuring 
GFP expression by flow cytometry. 

Plasmid construction. The C-terminal Flag-tagged mouse Mxra8 corresponding 
to the transcript (NM_024263) was synthesized (Integrated DNA Technologies) 
and cloned into the lentivirus vector pCSH-EF1-IRES-Venus with restriction 
sties NotI/BamHI. The Mxra8 sgRNA target sequences were mutated (cttgtgga- 
tatgtattcggcg to ctGgtCgaCatgtaCAGCgcg) to avoid re-cutting by Cas9 protein 
for the trans-complementation. Based on this plasmid, a truncation lacking the 
cytoplasmic domain was constructed by PCR-mediated mutagenesis, the AC-tail 
(378-442). To express the GPI-anchored Mxra8, the N-terminal 336 amino 
acids missing the transmembrane and cytoplasmic domains were fused with 
PLAP (ctggcegecccccegecggcaccaccgacgccgcgcacccgggecestccgtgstccccgcgttgct- 
tectctgctggccgggaccctgctgctgctggagacggccactgctccc) or the rodent herpesvirus 
Peru (RHVP) open reading frame R1 gene (tacccatacgatgttccagattacgctacgtcct- 
caccatccattggcggcccaaacatgactttactattggccatgatcatgtttgcgttaaagatagegstcg, HA tag 
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is underlined) GPI anchor sequences. To assess the function of different human 
MXRAS isoforms, the cDNA of isoform 2 (NM_032348.3) containing C-terminal 
Myc and Flag tags was purchased from OriGene (Cat. No. RC200955), and cloned 
into the lentivirus vector pCSII-EF1-IRES-Venus. Isoform 1 (NM_001282585.1), 
isoform 3 (NM_001282584.1) and isoform 4 (NM_001282583.1) were created by 
either mutagenesis of isoform 2 or gene synthesis (Integrated DNA Technologies), 
and cloned into the lentivirus vector pCSII-EF1-IRES-Venus containing C termi- 
nus Myc and Flag tags. 

Trans-complementation and ectopic expression experiments. The plasmids 
constructed above were packaged using the lentivirus expression system. Cells 
transduced with these lentiviruses were sorted for Venus-positive cells by flow 
cytometry. 3T3 cells were inoculated with CHIKV-181/25 (MOI 3) for 9.5h 
or CHIKV-AF15561 (MOI 10) for 20h. HeLa and A549 cells were inoculated 
with CHIKV-181/25 (MOI 3), CHIKV-AF15561 (MOI 10) or CHIKV-LR 2006 
(MOT 1) for 14h. 293T cells were inoculated with CHIKV-181/25 (MOT 1), 
CHIKV-AF15561 (MOI 3) or CHIKV-LR 2006 (MOI 0.5) for 11.5h. CHO cells 
(K1 and 745) were inoculated with CHIKV-181/25 (MOI 0.3), CHIKV-AF15561 
(MOI 10) or CHIKV-LR 2006 (MOI 0.3) for 12h. Cells were then collected and 
processed for E2 expression by flow cytometry. 

Generation and production of Mxra8-Fc and MXRA8-2-Fc. A cDNA fragment 
encoding the mouse Mxra8 extracellular domain (residues 22-336, GenBank 
accession number NM_024263) or the human MXRA8-2 extracellular domain 
(residues 20-337, GenBank accession number NM_032348.3) and the mouse 
IgG2b-—Fc were synthesized (Integrated DNA Technologies) and inserted into 
the pCDNA3.4 vector (Thermo Fisher) downstream of an IL-2 signal peptide 
sequence. After confirmation by Sanger sequencing, Mxra8-Fc and MXRA8-2-Fc 
were expressed into Expi293 cells (Thermo Fisher). Cells were seeded at 5 x 10° 
cells per ml one day before transfection. Two hundred micrograms of plasmid was 
diluted in Opti- MEM (Thermo Fisher) and complexed with HYPE-5 transfection 
reagent before addition to cells. Transfected cells were supplemented daily with 
Expi293 medium and 2% (w/v) Hyclone Cell Boost. Four days after transfection, 
supernatant was collected, centrifuged at 3,000g for 15 min, and purified by protein 
A sepharose 4B (Thermo Fisher) chromatography. After elution and buffer neu- 
tralization, the purified protein was dialysed into 20mM HEPES, 150mM NaCl 
(pH 7.5), filtered through a 0.20-\1m filter, and stored at —80°C. Mxra8 that was 
cleaved from the IgG backbone was generated by inserting an HRV cleavage 
sequence (LEVLFQGP) into Mxra8-Fc downstream of the Mxra8 coding sequence 
and before the mouse IgG sequence. Mxra8- HRV-Fc was expressed in Expi293 
cells as described above. After purification, Mxra8-HRV-Fc was cleaved using 
HRV 3C protease (Thermo Fisher) at a 1:10 ratio overnight at 4°C. Cleaved Fc 
fragments were depleted using protein A sepharose chromatography, and the purity 
of Mxra8 was confirmed using SDS-PAGE analysis. 

Expression of CHIKV virus-like particles. The CHIKV virus-like particles plas- 
mid (strain 37997) was provided by G. Nabel (via the Vaccine Research Center, 
NIH) and expressed in Expi293 cells as previously described'®. The supernatant 
was collected four days after transfection, 0.2-\1m filtered and stored at 4°C. 
ELISA-based Mxra8-Fc binding assays. Anti-CHIKV human mAb 4N12”’ or 
anti-EEEV human mAb 53 (J.E.C., unpublished data) were immobilized (50 il, 
2\g/ml) onto Maxisorp ELISA plates (Thermo Fisher) overnight in sodium bicar- 
bonate buffer, pH 9.3. Plates were washed four times with PBS and blocked with 
PBS supplemented with 4% BSA for 1h at room temperature. CHIKV-181/25 or 
SINV-EEEV was diluted to 1.5 x 10” FFU per ml in PBS and 50,11 per well was 
added for 1h at room temperature. Mxra8-Fc and respective positive (CHK-11° 
and EEEV-10 (A.S.K. and M.S.D., unpublished results) and negative OPC-Fc 
(D.H.E, unpublished results) controls were diluted in PBS supplemented with 2% 
BSA and incubated for 1h at room temperature. After serial washes with PBS, 
plates were incubated with horseradish peroxide conjugated goat anti-mouse IgG 
(H+L) (1:2000 dilution, Jackson ImmunoResearch) for 1h at room temperature. 
After washing with PBS, plates were developed with 3,3/-5,5’ tetramethylbenzidine 
substrate (Dako) and 2N H2SO,. Absorbance was read at 450 nm with a TriStar 
Microplate Reader (Berthold). The mAb competition binding assay was performed 
by incubating 10}.g/ml of indicated anti-CHIKV human mAbs’? for 30 min before 
the addition of mMxra8-Fc, as described above; the anti-CHIKV human mAbs 
were mapped previously to different epitopes by alanine scanning mutagenesis 
and evaluated for neutralizing activity!’. Humanized anti-WNV mAb E16” was 
included as a negative control in competition binding assays. 

Surface plasmon resonance based Mxra8 binding assay. Surface plasmon 
resonance binding experiments were performed on a Biacore T200 system (GE 
Healthcare) to measure the kinetics and affinity of Mxra8 binding to CHIKV virus- 
like particles. Experiments were performed at 30\1l/min and 25°C using 0.01 M 
HEPES pH 7.4, 0.15 M NaCl, 3mM EDTA, 0.005% v/v surfactant P20 as running 
buffer. CHK-265 mAb* was immobilized onto a CM5 sensor chip (GE Healthcare) 
using standard amine coupling chemistry, and CHIKV virus-like particles were 
captured. Recombinant Mxra8 was injected over a range of concentrations (1 1M 
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to 20nM) for 5 min followed by a 10 min dissociation period. As a negative control, 
murine norovirus was captured with mAb A6.2, as previously described*>. Real- 
time data were analysed using BIAevaluation 3.1 (GE Healthcare). Kinetic profiles 
and steady-state equilibrium concentration curves were fitted using a global 1:1 
binding algorithm with drifting baseline. 

Cell-based Mxra8-Fc binding assay. 3T3 cells were inoculated with different 
viruses at the following MOI: CHIKV-181/25 (MOI of 3, 9.5h), ONNV (MOI of 
5, 12h), MAYV (MOI of 3, 24h), RRV (MOI of 3, 32h), SINV (MOI of 10, 9h) 
and VEEV (MOI of 3, 6.5h). Cells were detached using TrypLE (Thermo Fisher 
#12605010), washed twice with cold HBSS supplemented with 15 mM HEPES 
and 2% FBS. Cells were incubated with 1 j1g/ml of Mxra8-Fc, OPG-Fc or viral 
E2-specific antibodies at 4°C for 25 min. After washing, cells were stained with goat 
anti-mouse IgG (H + L) conjugated with Alexa Fluor 647 for 25 min at 4°C. After 
washing twice, cells were fixed with 2% PFA for 10min at room temperature. After 
two additional washes, cells were subjected to flow cytometry analysis. 

Virus binding and internalization assays. The assays were conducted in sus- 
pension. MEF cells were collected using TrypLE and washed twice with ice-cold 
medium supplemented with 2% FBS. CHIKV-AF15561 virions were purified 
through a 25% glycerol cushion at 25,000 r.p.m. for 2h. For the binding assay, 
cells (5 x 10°) and virions (MOI of 20) were mixed in a 1.5-ml microcen- 
trifuge tube and incubated on ice for 45 min. After five cycles of centrifuga- 
tion and washing, cells were lysed in RLT buffer for RNA extraction using an 
RNeasy Mini Kit (QIAGEN #74104). For the internalization assay, after 5 
cycles of centrifugation and washing, cells were resuspended into medium 
supplemented with 2% FBS and 15 mM NH,Cl and then incubated at 37°C for 
Lh. Cells were chilled on ice and treated with 500 ng/ml proteinase K on ice 
for 1h. After three additional washes, cells were lysed in RLT buffer for RNA 
extraction. RT-PCR was conducted with Gapdh as an internal control using a 
TaqMan RNA-to-CT 1-Step Kit (Thermo Fisher #4392938). Primers and probes 
used are as follows: Fwd CHK181/AF: 5‘-TCGACGCGCCATCTTTAA-3’; 
rev CHK181/AF: 5’/-ATCGAATGCACCGCACACT-3’; probe CHK181/ 
AF: 5’ 6-FAM/ACCAGCCTG/ZEN/CACCCACTCCTCAGAC/3’ IABkFQ; 
fwd Gapdh: 5'-GTGGAGTCATACTGGAACATGTAG-3’; rev Gapdh: 
5’-AATGGTGAAGGTCGGTGTG-3’; and probe Gapdh: 5’ 6-FAM/ 
TGCAAATGG/ZEN/CAGCCCTGGTG/3’ IABkFQ. 

For the flow cytometry-based binding assay, experiments were conducted as 
above but with 5 x 10* cells and an MOI of 200. After binding and washing, cells 
were fixed and stained with a mixture of mAbs (CHK-11, CHK-84, CHK-124 and 
CHK-166°) (1 jg/ml) at room temperature for 25 min. Cells were washed once and 
stained with 2 1g/ml of goat anti-mouse IgG (H+ L) conjugated with Alexa Fluor 
647 (Thermo Fisher #A21235) for 25 min. After two additional washes, cells were 
subjected to flow cytometry analysis. 

Surface staining of mouse Mxra8 and human MXRAS8. Mouse or human cells 
were collected with TrypLE and washed twice with cold HBSS supplemented 
with 15mM HEPES and 2% FBS. Cells were incubated with anti-Mxra8 (mouse) 
Armenian hamster serum (1:300), anti-Mxra8 (mouse) hamster mAbs (1 jig/ml) 
or anti- MXRA8 (human) mAb (1 ,:g/ml) (MBL International # W040-3) at 4°C for 
25min. After washing, cells were stained with 2|1g/ml goat anti- Armenian hamster 
IgG (H +L) conjugated with Alexa Fluor 647 (Abcam #ab173004) or goat anti- 
mouse IgG (H+ L) conjugated with Alexa Fluor 647 (Thermo Fisher #A21235) 
for 25 min at 4°C. After two additional washes, cells were fixed with 2% PFA for 
10 min at room temperature. Cells were then washed twice and subjected to flow 
cytometry analysis. 

Cell viability assay. A CellTiter-Glo Luminescent Cell Viability Assay (Promega) 
was performed according to the manufacturer's instructions. In brief, 2 x 104 3T3 
or MEF cells in 100,11 culture medium were seeded into opaque-walled 96-well 
plates. 24h later, 100 11 of CellTiter-Glo reagent was added to each well and allowed 
to shake for 2 min. After a 10-min incubation at room temperature, luminescence 
was recorded by using a Synergy H1 Hybrid Plate Reader (Biotek) with an inte- 
gration time of 0.5 per well. 

Western blotting. Cells seeded in 6-well plates were washed once with PBS, chilled 
on ice in PBS and detached with a cell scraper. After spinning at 300 g for 5 min, 
cell pellets were lysed in 45 1l RIPA buffer (Cell Signaling #9806S) with a cocktail 
of protease inhibitors (Sigma-Aldrich # $8830). Samples were prepared in LDS 
buffer (Life Technologies) under reducing (+dithiothreitol) conditions. After 
heating (70°C, 10 min), samples were electrophoresed using 10% Bis-Tris gels 
(Life Technologies) and proteins were transferred to PVDF membranes using an 
iBlot2 Dry Blotting System (Life Technologies). Membranes were blocked with 
5% non-fat dry powdered milk and probed with hamster mAb 3G2.F5 (0.5 1g/ml) 
against mouse Mxra8. Western blots were developed using SuperSignal West Pico 
Chemiluminescent Substrate or SuperSignal West Femto Maximum Sensitivity 
Substrate (Life Technologies). 

Alanine scanning mutagenesis for mapping. A CHIKV E2, 6K and El enve- 
lope protein expression construct (strain $27, Uniprot Reference #Q8JUXS5) with 
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a C-terminal V5 tag was subjected to alanine scanning mutagenesis to generate 
a comprehensive mutation library*®. Each residue of the envelope proteins was 
mutated to alanine, with alanine codons mutated to serine. One hundred and 
forty-one mutations within the E2 A and B domains were screened for binding 
by Mxra8-Fc. Binding of Mxra8-Fc to each mutant expressed in HEK-293T cells 
was determined by immunofluorescence detected with a high-throughput flow 
cytometer (Intellicyt), as previously described*®. Residues of domains A or B were 
identified as contributing to the binding site if their mutation eliminated Mxra8- 
Fc binding, but supported binding of CHIKV mAbs that bind to the appropriate 
domain (control mAbs were CHKV-84, CHKV-88, IM-CKV063, IM-CKV065 
and C993637), 

Mapping of mutations onto the CHIKV p62-E1 crystal structure. Figures were 
prepared using the atomic coordinates of CHIKV p62-E1 (RCSB accession num- 
ber 3N41) using the program PyMOL (The PyMOL Molecular Graphics System, 
Version 1.7.4 Schrodinger, LLC). 

mAb generation. Four-week-old male Armenian hamsters were immunized intra- 
venously with 100 1g of purified Mxra8—Fc. After two boosts (~7 months of age), 
spleens were collected for hybridoma fusion and mAb production. Hybridoma 
supernatants were initially screened by ELISA using Mxra8-human Fc (mouse Fc 
was replaced by human Fc). As a second assay, we examined the binding of hybri- 
doma supernatants to Mxra8 on the surface of 3T3 cells by flow cytometry. Finally, 
a tertiary screen evaluating blockade of CHIKV-181/25 infection by hybridoma 
supernatants was performed in 3T3 cells. After limiting dilution subcloning, the 
seven clones with the strongest blocking activities were selected and expanded. 
Antibodies were purified using protein A sepharose 4B (Invitrogen #101042), 
dialysed in PBS, concentrated and filtered for in vitro and in vivo experiments. 
Blocking assays with Mxra8-Fc, MXRA8-2-Fc or anti-Mxra8 mAbs. Twenty- 
five thousand 3T3 or MRC-5 cells were seeded into 96-well plates 12h before 
treatment. CHIKV-181/25 virions were purified through a 25% glycerol cushion 
at 25,000 r.p.m. for 2h. Serially diluted Mxra8-Fc or MXRA8-2-Fc protein was 
incubated with purified virions (MOI of 3) for 1h at 37°C in a volume of 100. 
The mixture was added to 3T3 or MRC-5 cells for 9.5h or 11.5h, respectively. 
Cells were then collected for intracellular E2 expression as measured by flow 
cytometry. For hamster mAb blocking experiments, 3T3 or MRC-5 cells were pre- 
incubated with serially diluted mAbs for 1h at 37°C ina volume of 501], and then 
purified virions (MOI of 3) in 50,11 were added and incubated for 9.5 or 11.5h, 
respectively. Cells were collected and intracellular E2 expression was analysed by 
flow cytometry. For hamster mAb blocking experiments on primary human cells 
(keratinocytes, dermal fibroblasts, synovial fibroblasts, osteoblasts, chondrocytes 
and skeletal muscle cells), 2 x 104 cells were seeded into 96-well plates 12h before 
treatment. Cells were pre-incubated with Armenian hamster isotype control 
(Bio X Cell # BE0260), 1H1.F5 or 9G2.D6 mAb for 1h at 37°C in a volume of 50 il 
(50 \1g/ml) and, subsequently, purified CHIKV-AF15561 virions (MOI of 15) in 
5011 were added and incubated for 10.5h. Cells were collected and intracellular 
E2 expression was analysed by flow cytometry. 

Phosphatidylinositol-specific phospholipase C treatment. 3T3 cells (10°) 
expressing GPI-anchored Mxra8 were collected using TrypLE and washed twice 
with PBS. Cells were then treated with 1 U/ml of phosphatidylinositol-specific 
phospholipase C (PI-PLC) (Sigma-Aldrich #P8804) in 5011 PBS at 37°C for 1h. 
After two more cycles of washing, cells were stained for Mxra8 expression and 
processed by flow cytometry analysis as described above. 

Mouse experiments. Experiments were carried out in accordance with the rec- 
ommendations in the Guide for the Care and Use of Laboratory Animals of the 
National Institutes of Health after approval by the Institutional Animal Care and 
Use Committee at the Washington University School of Medicine (Assurance num- 
ber A3381-01). Mxra8-Fc or an IgG control (JEV-13 mAb) (250g per mouse in 
PBS) was administered via an intraperitoneal route to four week-old wild-type male 
C57BL/6] mice 6h before subcutaneous inoculation in the footpad with 10° FFU of 
CHIKV-AF15561. Alternatively, in co-injection experiments, CHIKV or ONNV 


was mixed directly with Mxra8-Fc or JEV-13 (25g per mouse in PBS), and incu- 
bated at 37°C for 30 min before inoculation. At 12h, 24h and 72h post-infection, 
animals were euthanized, and after perfusion with PBS, indicated tissues were col- 
lected. For antibody pre- or post-treatment experiments, 300 1g of purified hamster 
mAbs 1G11.E6 + 7F1.D8 or 4E7.D10 + 8F7.E1, and isotype control PIP (Bio X cell 
# BE0260) in PBS were administered via an intraperitoneal route to four week- 
old wild-type male C57BL/6 mice 12h prior to, or 8 or 24h after, subcutaneous 
inoculation in the footpad with 10° FFU of CHIKV-AF15561. Virus was titrated 
by focus forming assay as described° using mouse CHK-11° for CHIKV or mouse 
CHK-48 for ONNV as the detection antibody. Joint swelling was monitored at 72h 
post-infection via left foot measurements (width x height) using digital calipers 
as previously described**, Samples sizes were estimated to determine a difference 
of three- to fivefold, depending on data distribution. Blinding and randomization 
were not performed. 

Statistical analysis. Statistical significance was assigned when P values were <0.05 
using Prism Version 7 (GraphPad). Cell culture experiments were analysed by mul- 
tiple t-tests with a Holm-Sidak correction or ANOVA with a multiple comparison 
correction. Analysis of levels of joint swelling or viral burden in vivo was deter- 
mined by a Mann-Whitney, ANOVA, Kruskal-Wallis or unpaired t-test depending 
on data distribution and the number of comparison groups. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. All data supporting the findings of this study are available within 
the paper and its Supplementary Information. The Supplementary Tables pro- 
vide data for the CRISPR-Cas9 screen, statistical analysis and alanine scanning 
mutagenesis mapping of the Mxra8-Fc binding site on CHIKV. All other data are 
available from the corresponding author upon reasonable request. 
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Extended Data Fig. 1 | CRISPR-Cas9-based gene editing screen. 

a, Mouse 3T3 cells were transduced separately with two half libraries 
(A+B) comprising 130,209 sgRNAs, selected with puromycin and then 
inoculated with CHIKV-181/25-mKate2 (MOI of 1). One day later, 
mKate2-negative cells were sorted and then expanded in the presence 
of 21g ml“! each of CHK-152 and CHK-166 neutralizing mAbs. 

Several days later, cells were re-inoculated with CHIKV-181/25-mKate2 
without neutralizing mAbs and re-sorted for mKate2-negative cells. This 
procedure was repeated one additional time. Afterwards, genomic DNA 
was collected for sgRNA sequencing and compared to the parent library 


Cattle Chicken 


for abundance (see Supplementary Tables 1, 2). b, Diagram of the mouse 
Mxra8 and human MXRA8 orthologues. The transcript identification 
numbers and length of proteins are indicated to the right. Partial deletions 
in the isoforms 3 and 4 are shown as dashed lines. c, Phylogenetic tree 

of Mxra8 indicating genetic relationships. The neighbour-joining tree 

was constructed using MEGA 7 (https://www.megasoftware.net/). Scale 
bar shows the branch length. Right, identity (red) and similarity (yellow) 
matrix indicating the conservation of Mxra8 between species. The matrix 
was generated using MagGat 1.8 (http://iubio.bio.indiana.edu/soft/molbio/ 
evolve/). 
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Extended Data Fig. 2 | Efficiency of targeting Mxra8 expression by AMxra8) were tested for Mxra8 surface expression by flow cytometry 
CRISPR-Cas9 gene editing. a, 3T3 cells were edited with a control or using anti-Mxra8 antibody (4E7.D10) and an isotype control mAb. One 
three different Mxra8 sgRNAs. After puromycin selection, bulk cells were representative experiment of two is shown. d, Sanger sequencing of Mxra8 
inoculated with CHIKV-181/25-mKate2 and processed for marker gene in control and AMxra8 3T3 or MEF cells. Sequencing data shows an 
expression by flow cytometry. Data are pooled from three experiments alignment and individual out-of-frame deletions. e, Viability of control 
and expressed as mean + s.d. (n =6, one-way ANOVA with a Dunnett’s and AMxra8 3T3 and MEF cells. Equal numbers of cells were plated 
multiple comparison test compared to control). b, Western blotting of and viability was assessed over a 24-h period using the Cell-Titer Glo 
Mxra8 in control and AMxra8 3T3 or MEF cells using hamster mAb 3G2. _ assay. The results were normalized to control cells and pooled from two 
F5. One representative of two is shown. c, 3T3 and MEF cells (parent or experiments (n =6). Error bars indicate s.d. ****P < 0.0001. 
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Extended Data Fig. 3 | CHIKV infectivity in CHO-K1 and CHO-745 
cells in the presence or absence of ectopic Mxra8 expression. 

a, Surface expression of Mxra8 on CHO-K1 (wild type) and CHO-745 
(glycosaminoglycan deficient'*) cells stably transduced with control 
(vector-only) or mouse Mxra8 as judged by flow cytometry. 

b, Confirmation of heparan sulfate expression on the surface of CHO-K1 
(wild type) and CHO-745 cells. Surface expression of heparan sulfate was 
evaluated using the R17 protein of rodent herpesvirus Peru, which binds 
to glycosaminoglycans on the surface of cells*’. R17°4® is a mutant form 
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of the protein that lacks binding to glycosaminoglycans and served as a 
negative control. In a and b, data are representative of two experiments. 
c, CHO-K1 (wild type) and CHO-745 cells were transduced stably with 
control (vector) or mouse Mxra8 and inoculated with CHIKV (strains 
181/25, AF15561 or LR-2006) and processed for intracellular E2 protein 
staining by flow cytometry. Data are from three experiments: mean + s.d. 
(n= 6, one-way ANOVA with a Dunnett’s multiple comparison test). 
#EEED < 0.001. 
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Extended Data Fig. 4 | Growth curves of related alphaviruses in (BEBV) or three (all others) experiments and expressed as mean +s.d. 
AMxra8 3T3 cells. Control and AMxra8 3T3 cells were inoculated (n=6, BEBV; n=9, BEV, GETV, UNAV and MIDV; n = 12, SEV; two- 
with BEBV, BFV, GETV, UNAV, MIDV or SFV at an MOI of 0.01 (except way ANOVA with Sidak’s multiple comparisons test). ***P < 0.001; 


for BEBV, which was at 0.001), and supernatants were collected at the #K KD < 2.0001. 
indicated times for focus forming assay. Data are pooled from two 
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Extended Data Fig. 5 | Surface expression of MXRAS in different (lung carcinoma), hCMEC/D3 (cerebral microvascular endothelial cells), 
human cell lines. Human cell lines were tested for MXRA8 surface RPE (retinal pigment epithelial cell), Jurkat (T cell lymphoma), Raji 
expression by flow cytometry: 293T (embryonic kidney), A549 (B cell lymphoma), K562 (eryrtholeukaemia), HT 1080 (fibrosarcoma) 
(lung adenocarcinoma), JEG3 (placental choriocarcinoma), U2OS and Hs 633T (fibrosarcoma) cells. Representative data are shown of two 
(osteosarcoma), HFF-1 (foreskin fibroblasts), HeLa (cervical carcinoma), independent experiments. Grey histograms, isotype control mAb; red 


Huh7 (hepatocarcinoma), HTR8/SVneo (trophoblast progenitor), MRC-5 —_ histograms, anti- MXRA8 mAb. 
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Extended Data Fig. 6 | MXRA8 supports enhanced infection of different 
CHIKV strains. a, Transduction and expression of different MXRA8 
(MXRA8-1, MXRA8-2, MXRA8-3 and MXRA8-4) isoforms in HeLa cells. 
Representative data are shown from two experiments. Grey histograms, 
isotype control mAb; red histograms, anti- MXRA8 mAb. b, Effect of 
ectopic expression of MXRA8-2 on CHIKV (strains 181/25, AF15561, 
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and LR-2006) infection of A549, HeLa or 293T cells. Cells were collected 
and stained for CHIKV antigen with an anti-E2 antibody. Data are pooled 
from three experiments and expressed as mean + s.d. (n = 6; two-tailed 
t-test with Holm-Sidak multiple comparison correction). c. Transduction 
and expression of MXRA8-2 in 293T, A549, and HeLa cells. Representative 
data are shown from two experiments. ***P < 0.001; ****P< 0.0001. 
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Extended Data Fig. 7 | Gene-editing of MXRAS8 in human cell lines. 

a, Flow cytometry analysis of MXRA8 expression in human MRC-5, 
HFF-1, RPE and Hs 633T cells expressing control or two different MXRA8 
sgRNAs. Data are representative of two experiments. b, Gene-edited cells 
were inoculated with CHIKV (strains 181/25, AF15561 or LR-2006) in 
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HFF-1, RPE, and Hs 633T cells. Cells were stained for viral antigen with 
an anti-E2 antibody. Data are pooled from two (HFF-1 and Hs 633T) or 
three (RPE) independent experiments and expressed as mean values + s.d. 
(n= 6; one-way ANOVA with a Dunnett’s multiple comparison test 
compared to the control). ****P < 0.0001. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


a MW (kDa) 
250 - 
Mxra8 or 150 - tam 
MXRA8-2 100 - 
ectodomain 75 - ae 
50 - 
Mouse or we 
human Fo : 
25 - 
20 - 
15 - 
10- NR oR 
b Purified F Screens: 
mouse Pars aE Boost —P 4) ELISAusing Mxra8-human Fo — Subcione 
Mxra8-Fc b) Cell-based binding assay Expand 
c) Functional blocking assay Purify 
Cc Mxra8-Fc MXRA8-2-Fc OPG-Fc 


d 


mAb blockade on MRC-5 cells 


_ 100 = 1G11.E6 
= —= 1H1.F5, 0.84 ugimL 
5 a0 <= 3G2.F5, 7.06 ug/mL 
3 -—— 4E7.D10 
£ —— 7F1.D8 
oo -©- 8F7.E1 
= + 9G2.D6, 2.01 ug/mL 
s 40 -# Isotype mAb 

20 


4 3 -2 -1 «0 1 2 
[mAb] (log, ,ug/mL) 


Extended Data Fig. 8 | Mxra8-Fc and anti-Mxra8 generation and 
function. a, Diagram of Mxra8-Fc (left) and SDS-PAGE (non-reducing 
(NR) and reducing (R) conditions) of purified material (right). Data are 
representative of three experiments. b, Scheme of anti-Mxra8 generation 
in Armenian hamsters. c, ELISA reactivity of anti-Mxra8 mAbs against 
Mxra8-Fc, MXRA8-2-Fc or OPG-Fc. Purified proteins (50 ul, 5 pg 
ml) were immobilized overnight at 4°C on ELISA plates. Anti-Mxra8 
and isotype control mAbs were incubated for 1 h at room temperature. 
Signal was detected at 450 nm after incubation with horseradish peroxide 
conjugated goat anti-Armenian hamster IgG (H + L) and development 
with 3,3’-5,5’ tetramethylbenzidine substrate. d, Blockade of CHIKV- 
181/25 infection in MRC-5 cells with seven different hamster anti-Mxra8 
or isotype control mAbs. mAbs were pre-incubated with cells for 1h at 


° 0.1 ng/mL 
© 1 yg/mL 
© 10 ug/mL 


Foot swelling 


200 ns. 
4 


log,)FFU/g tissue 
% initial joint measurement 


Ipsilat Contra Ipsilat Contra +8h +24h 
+8h +24h 4 isotype 
4 Isotype = 1G11.E6 + 7F1.D8 


= 1G11.E6 + 7F1.D8 


37°C before addition of virus. After infection, cells were processed for 
E2 staining by flow cytometry. Relative infection was compared to a no 
mAb condition using flow cytometry and anti-E2 staining. Data in c and 
dare pooled from two experiments (n = 6) and expressed as mean + s.d. 
e, Anti-Mxra8 mAbs (1G11 + 7F1) or isotype control hamster mAbs 
(300 1g total) were administered via intraperitoneal route 8 or 24h after 
inoculation of CHIKV-AF15561 in the footpad. Left, at 72h after initial 
infection, CHIKV titres were measured in the ipsilateral and contralateral 
gastrocnemius (calf) muscles. Right, at 72 h, ipsilateral foot swelling was 
measured and compared to measurements taken immediately before 
infection. Data are pooled from two experiments (n = 8; two-tailed 
Mann-Whitney test) and expressed as median values. *P < 0.05, n.s., not 
significant. 
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Extended Data Fig. 10 | Binding of Mxra8-Fc to surface-displayed 

E2 protein in virus-infected cells and mapping of Mxra8 binding site 
on E2. a, Diagram of the cell-based binding assay. After infection, viral 
structural proteins (for example, E2) traffic to the cell plasma membrane 
where progeny virion assembles and buds. E2 protein is displayed on the 
cell surface and is accessible to the binding of Mxra8-Fc and detection 
with a goat anti-mouse IgG secondary antibody by flow cytometry. 

b, Binding of Mxra8-Fc to virus-infected wild-type 3T3 cells. Cell were 
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infected with the indicated viruses and processed for Mxra8-Fc binding 
by flow cytometry. Virus-specific anti-E2 antibodies were used as positive 
controls. Data are representative of two independent experiments. 

c, d, Mapped residues are shown as magenta spheres (c) or sticks (d) on the 
CHIKV p62-E1 structure (c, trimer of dimers, top view; d, heterodimer, 
side view) using PyMOL (Protein Data Bank code: 3N41). The El and E2 
proteins are coloured in grey and cyan, respectively. 
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Bystander CD8* T cells are abundant and 
phenotypically distinct in human tumour infiltrates 
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Various forms of immunotherapy, such as checkpoint blockade 
immunotherapy, are proving to be effective at restoring T cell- 
mediated immune responses that can lead to marked and sustained 
clinical responses, but only in some patients and cancer types!*. 
Patients and tumours may respond unpredictably to immunotherapy 
partly owing to heterogeneity of the immune composition and 
phenotypic profiles of tumour-infiltrating lymphocytes (TILs) 
within individual tumours and between patients™°. Although there 
is evidence that tumour-mutation-derived neoantigen-specific T 
cells play a role in tumour control”, in most cases the antigen 
specificities of phenotypically diverse tumour-infiltrating T cells 
are largely unknown. Here we show that human lung and colorectal 
cancer CD8* TILs can not only be specific for tumour antigens (for 
example, neoantigens), but also recognize a wide range of epitopes 
unrelated to cancer (such as those from Epstein-Barr virus, human 
cytomegalovirus or influenza virus). We found that these bystander 
CD8* TILs have diverse phenotypes that overlap with tumour- 
specific cells, but lack CD39 expression. In colorectal and lung 
tumours, the absence of CD39 in CD8* TILs defines populations 
that lack hallmarks of chronic antigen stimulation at the tumour 
site, supporting their classification as bystanders. Expression of 
CD39 varied markedly between patients, with some patients having 
predominantly CD39~ CD8* TILs. Furthermore, frequencies 
of CD39 expression among CD8* TILs correlated with several 
important clinical parameters, such as the mutation status of lung 
tumour epidermal growth factor receptors. Our results demonstrate 
that not all tumour-infiltrating T cells are specific for tumour 
antigens, and suggest that measuring CD39 expression could bea 
straightforward way to quantify or isolate bystander T cells. 

Using mass cytometry and a panel dedicated to the detailed profiling 
of tumour infiltrating T cells we observed that, consistent with previous 
reports>*!!, CD8* TILs constitute a highly heterogeneous cell popu- 
lation both within individual tumours (Fig. 1a) and among patients 
with lung and colorectal tumours (n = 144 patient tumours analysed 
by mass cytometry in this study) (Fig. 1b and Extended Data Fig. 1). 
We therefore decided to investigate the antigen specificity of CD8* 
TILs to better understand the basis for this heterogeneity. In total, we 
screened for 1091 putative neoantigens, 123 tumour-associated anti- 
gens (TAA) and 46 cancer-unrelated epitopes (mostly virus-derived) 
using mass cytometry coupled to multiplex major histocompatibility 
complex (MHC)-tetramer staining, as reported previously’? (Fig. 2, 
Extended Data Fig. 2 and Supplementary Tables 1-3). Two positive 
hits were detected for neoantigen epitopes from a total of 24 patients 
tested (Fig. 2b, c and Supplementary Table 4). As 0.18% of the 1,091 
computationally-predicted putative neoantigens could be confirmed 


experimentally, these data are in line with other publications reporting 
identification rates for neoantigen-specific CD8* T cells from pre- 
dicted neoantigens of between 0% and 0.5%'*"!. The small number of 
neoantigen-specific T cell populations detected may also be related to 
the relatively low mutational burden of these tumours (Supplementary 
Table 4), even though neoantigen-specific T cell responses have pre- 
viously been reported in the context of other tumours with low muta- 
tional burden'*!°. Nevertheless, these results highlight the challenge of 
accurately predicting and validating neoantigens for therapeutic pur- 
poses”. We also detected two tumour-specific CD8* TIL populations 
in an unusual case of lung cancer associated with Epstein-Barr virus 
(EBV) infection (lymphoepithelioma-like carcinoma, LELC) (Extended 
Data Fig. 3). Despite testing 40 patient tumours with large panels of 
TAA-derived epitopes, we failed to identify TAA-specific CD8* TILs. 
Data from in vitro expanded CD8* T cells have shown that these 
cells can be detected by MHC-tetramer staining'”"!°. MART-1 and 
NY-ESO-1 epitope-specific T cells have also been detected in unex- 
panded TILs°”!. It is possible that TAA-specific CD8* TIL cells were 
absent or present at undetectably low frequencies in all of the samples 
we tested. 

Unexpectedly, we detected cancer-unrelated MHC-tetramer‘ cells 
(n= 46 CD8* T cell populations) in cohorts of patients with lung cancer 
or colorectal cancer (in 9 of 24 lung cancer patients, 37.5%; and in 21 
of 42 colorectal cancer patients, 50%) (Fig. 2b). In these cases, MHC- 
tetramer? CD8* TILs were specific for various Epstein Barr virus 
(EBV), human cytomegalovirus (HCMV) or influenza virus epitopes 
that were presented by three different HLA alleles (Fig. 2d). Frequencies 
for individual epitopes varied between 0.07% and 3.3% of total CD8* 
TILs, and 11 examples of these were validated using fluorescence flow 
cytometry (Fig. 2d, e, Extended Data Fig. 4). The expression of CD69 
and/or CD 103 in many of these cancer-unrelated CD8" TILs suggests 
that they are not derived from blood contamination (Extended Data 
Fig. 4). These data therefore show that CD8* TILs are not all specific 
for tumour antigens, but can include bystander CD8* TILs that are 
specific for cancer-unrelated epitopes. 

Having identified cancer-unrelated bystander and tumour-specific 
CD8* TILs, we next compared the phenotypes of these two popula- 
tions with those of remaining CD8* TILs of unknown specificity. All 
the tumour-specific CD8* TILs that we identified displayed resident 
memory T cell-like phenotypes and expressed various co-stimulatory 
and inhibitory receptors, such as PD-1 (Fig. 3a and Extended Data 
Fig. 3, 5). Surprisingly, we observed overlapping but diverse pheno- 
typic profiles for cancer-unrelated CD8* TILs with respect to these 
markers. Many of the bystander CD8* TILs also expressed resident 
memory T cell-like phenotypes as well as various co-stimulatory 
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Fig. 1 | Tumour-infiltrating CD8* TILs 

are phenotypically heterogeneous within a 
tumour and across patients. a, t-distributed 
stochastic neighbour embedding (t-SNE) map 
of CD8* TILs isolated from a colorectal tumour. 
t-SNE was performed on one patient to explore 
the heterogeneity of CD8* TILs within an 
individual (see also Extended Data Fig. 1 and 
Methods). Representative data from one patient 
(see Methods for source data availability). 

b, t-SNE map of CD8* TILs isolated from lung 
tumours or colorectal tumours. Mass cytometry 
and t-SNE were performed simultaneously on 
six different patients from each cohort to explore 
the heterogeneity of CD8* TILs across patients. 
Patient identifiers refer to individual patients. 
Representative data from n =6 patients for each 
cancer type. 
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and activation marker molecules. The inhibitory receptors TIGIT 
and PD-1, two markers that were previously shown to be expressed 
by tumour-antigen-specific CD8* T cells’, were also expressed by 
many of these cells (Fig. 3a, c). Although PD-1 has been proposed as 
a marker of tumour-specific CD8* T cells”’, our results are consist- 
ent with previous reports of virus-specific CD8* T cells infiltrating 
tumours that express PD-1 in mice*. However, we observed a striking 
lack of CD39 expression in bystander CD8* TILs (5.2 + 8.4% (s.d. is 
used throughout), n = 46). By contrast, CD39 was highly expressed 
by tumour-specific CD8* TILs and variably expressed by cells of 
unknown specificity (40.4 + 27.2%, P < 0.0001) (Fig. 3b, c). CD39 
is a transmembrane extracellular ATPase that is widely expressed by 
regulatory T cells, B cells and some tumour cells. In conjunction with 
the enzymatic activity of CD73, CD39 can catalyse the conversion of 
ATP to adenosine, which has been shown to have immunosuppres- 
sive activity?>”°. Based on these data, we hypothesize that the lack of 
CD39 could be used to enrich for cancer-unrelated bystander CD8* 
TILs. Conversely, though we think that these results suggest that CD39 
could also be a useful marker of tumour-specific CD8* TILs, this link 
could be observed only in two neoantigen responses from two patients 
and two tumour-specific CD8* TIL populations in an unusual LELC 
tumour. 

To better compare the characteristics of CD39" CD8* and CD39T 
CD8* TILs, we performed transcriptomic profiling. Using principal 
component analysis (PCA) and gene set enrichment analysis (GSEA), 
we found that CD39* CD8* TILs were enriched in expression of genes 
related to cell proliferation and exhaustion, which are characteristics 
of chronically stimulated T cells””-”? (Fig. 4a, b and Extended Data 
Fig. 6), consistent with previous reports in both cancer”? and infec- 
tious disease*®. In line with this, T cell receptor (TCR) sequencing 
indicated a skewed and reduced diversity of TCR sequence diversity 
in CD39* CD8*t TILs (Extended Data Fig. 7), supporting the notion 
of enrichment for cells that have undergone tumour-antigen-driven 
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clonal expansion?*””. At the protein level, compared to their CD39- 
counterparts, CD39* CD8* TILs from colon and lung tumours showed 
hallmarks of exhausted cells in terms of both phenotypic and func- 
tional markers (Fig. 4c, d and Extended Data Fig. 8), consistent with 
the transcriptomic profiling data. Thus, expression of CD39 defines 
a population of highly exhausted cells; whereas the absence of CD39 
in CD8* TILs defines a population whose phenotype is inconsistent 
with chronic antigen stimulation at the tumour site, consistent with a 
bystander role. 

Next, we investigated whether expression of CD39 by CD8* TILs 
was linked to clinical parameters measured in either of the studied 
patient cohorts. In colorectal tumours, we detected highly heteroge- 
neous frequencies of CD39 expression among CD8°* TILs (n= 94; 
mean, 44.5 + 23.7%; minimum, 0.2%; maximum, 85.8%) (Fig. 4e 
and Extended Data Fig. 9). No significant correlations with any 
clinical parameters (tumour mutational burden, driver mutational 
status or consensus molecular subtype (CMS)) were obtained from 
this cohort (Extended Data Fig. 9). However, based on the tran- 
scriptomic profiles of adjacent frozen tumour sections, we found 
that tumours with higher percentages of CD39+ CD8* TILs had 
gene expression profiles indicative of T cell inflammation and 
expression of pathways associated with antigen processing and 
presentation (Extended Data Fig. 10). In lung cancer patients, the 
expression of CD39 in CD8* TIL cells was also highly heterogene- 
ous across patients (1m = 50; mean, 21.9 + 23.23%; minimum, 0%; 
maximum, 88.5%) (Fig. 4e and Extended Data Fig. 9). For this type 
of cancer, the frequency of CD39* CD8* TIL cells clearly corre- 
lated with the epidermal growth factor receptor (EGFR)-mutation 
status, an oncogenic driver mutation that is especially common in 
East Asian patients with lung cancer*!. Preliminary reports sug- 
gest that patients with EGFR-mutated tumours are relatively poor 
responders to anti-PD-1 treatment and have low CD8* T cell 
density compared to patients with EGFR-wild-type tumours”. 
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Fig. 2 | Tumour-specific and cancer-unrelated CD8* T cells infiltrate 
tumour tissues. a, Schematic for screening of neoantigens (NeoAg), 
tumour-associated antigens (TAA) and cancer-unrelated epitopes by 
mass cytometry coupled to multiplex MHC-tetramer staining. See also 
Extended Data Fig. 2 and Supplementary Tables 1-3 for examples and list 
of peptides. b, Total number of different MHC class I tetramers screened 
for neoantigens, TAA and cancer-unrelated epitopes by mass cytometry 
(left). Total number of different MHC class I tetramers identified for 
neoantigens, TAA and cancer-unrelated epitopes by mass cytometry 
(right). Neoantigens are colour-coded by patient. See also Supplementary 
Table 4 for patient information. c, Flow cytometry dot plots representing 
MHC tetramer* CD8* TILs identified from colorectal tumour CD3* 
TILs using mass cytometry screening. MHC-tetramer mutated (mut) 
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DST epitope (mutated residue is underlined), ISDEMFKTFK; MHC- 
tetramer mutAHR epitope, GISQELPYK. Percentages represent the 

MHC tetramer* cells among CD8* TILs for each patient. Data from two 
independent experiments. d, Frequencies of cancer-unrelated CD8* TILs 
identified by mass cytometry and multiplex MHC-tetramer staining in 
lung tumours (right, 7 = 11 MHC tetramer™ populations) or colorectal 
tumours (left, n = 35 MHC tetramer* populations). Peptide sequences are 
listed in Supplementary Table 3. Inf., influenza virus. e, Representative 
flow cytometry dot plot showing cancer-unrelated CD8* TILs specific for 
different epitopes, identified from colorectal tumour CD3* TILs using 
mass cytometry screening. Percentages are of MHC tetramer* cells among 
CD8* TILs for each patient. See also Extended Data Fig. 4. Data are from 
two independent experiments. 
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Fig. 3 | Cancer-unrelated CD8* TILs do not express CD39. a, Expression 
of markers by cancer-unrelated (blue, n = 46 biologically independent MHC 
tetramer* cells) and tumour-specific CD8* TILs (red, n = 4 biologically 
independent MHC tetramer” cells) in human tumours. Triangles represent 
neoantigen-specific CD8* TILs and squares represent tumour-specific 
CD8* TILs derived from an LELC (See Extended Data Figs. 3, 5). Data are 
mean + s.d. from at least ten independent mass cytometry experiments. 
Each data point represents an antigen-specific population. b, Expression 

of CD39 by cancer-unrelated (blue, n = 46 biologically independent MHC 
tetramer* cells) or tumour-specific CD8* T cells (red, n= 4 biologically 


independent MHC tetramer” cells) with paired total CD8* TILs. Triangles 
represent neoantigen-specific CD8+ TILs and squares represent tumour- 
specific CD8* TILs in an LELC (See Extended Data Fig. 3). Data are mean 
+ s.d.; paired two-tailed t-test; lung tumour only, P= 0.0064; colorectal 
tumour only, *P < 0.0001. c, Representative flow cytometry dot plot 
representing the expression of PD-1 and CD39 by cancer-unrelated (blue) 
or tumour-specific CD8* TILs (red) identified from colorectal tumours by 
mass cytometry. Frequencies of CD39* cells among cancer-unrelated (blue), 
tumour-specific (red) or all CD8* TILs (grey) for each patient. Data are 
from two independent experiments. 
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Fig. 4 | Comparative analysis of CD8* TILs stratified by CD39 
expression. a, Projection of the whole transcriptome of sorted blood 
naive (CCR7* CD45RO_, white; n = 4 patients), blood effector (CCR7— 
CD45RO?t, grey; n=5 patients), tumour CD39~ (blue, n=7 patients) 
and CD39* CD8* TILs (red, n= 8 patients) using PCA. Ellipses represent 
the standard deviation around the centroid of a phenotype. See also 
Supplementary Table 5. b, Enrichment of the gene set for exhausted 

T cells*®”? in CD39* CD8* TILs. Genes towards the left are enriched in 
CD39* CD8* TILs (1 =8 patients), genes towards the right are enriched 
in CD39~ CD8* TILs (n=7 patients). Two-sided GSEA empirical test. 
See also Extended Data Figs. 6, 7. c, Mass cytometry dot plots representing 
expression of OX-40, ICOS, Ki67, TIM-3, PD-1 and CTLA-4 according to 
CD39 status by CD8* TILs. Representative data from one patient with a 
colorectal tumour. Data are from at least ten independent mass cytometry 


In our study cohort, the frequency of CD39+ CD8* TIL cells was 
significantly higher in patients with EGFR-wild-type tumours 
(32.3% + 20.35%) in comparison to those with EGFR-mutant 
tumours (16.24% + 23.51%), where these cells were virtually absent 
(<2% of CD8* TIL cells) in 50% of these patients (Fig. 4f). These 
data suggest that the poor observed response to anti-PD-1 treatment 
could be associated with the abundance of bystander CD39" CD8* 
TIL cells in patients with EGFR-mutant tumours. 

We also obtained peripheral blood from a patient with microsatel- 
lite-instable metastatic colorectal cancer who realized a rapid clinical 
and radiological response with anti-PD-1 treatment (pembrolizumab). 
The proliferation marker Ki67 enables identification and monitoring 
of the immunological response during anti-PD-1 treatment in periph- 
eral blood?*. Of note, the proliferating CD8* T cells in the blood from 
this patient were characterized by high expression of CD39 (Fig. 4g). 
These data are consistent with an expansion of a CD39* population in 
the peripheral blood of a patient responding to anti-PD1 treatment. 
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experiments. d, Frequencies of the expression of activation markers (top) 
and inhibitory markers (bottom) by CD39~ (blue) and CD39* (red) CD8t 
TILs in colorectal tumours (n = 9 to 45 patients). Data are mean + s.d. 
from at least ten independent mass cytometry experiments. Two-tailed 
paired t-test. *P < 0.0001. See also Extended Data Fig. 8. e, Frequencies 

of CD39* CD8* T cells in lung tumours (1 = 50 patients), colorectal 
tumours (n = 94 patients) and the matched PBMCs. Data are mean + 
s.d. f, Frequencies of CD39* CD8* T cells in lung tumours stratified by 
mutation status of EGFR. Wild-type (WT) EGFR, n= 17 patients; mutant 
EGER, n= 25 patients. Data are mean + s.d., two-tailed unpaired t-test. 
g, Schematic for blood collection time point in colorectal cancer patients 
treated with pembrolizumab (anti-PD-1 treatment) (top). t-SNE map 

of CD8* T cells isolated from PBMC at —1, 9 and 24 days relative to 
pembrolizumab treatment (bottom). 


This suggests that serial changes in CD39* T cells may be an early 
blood-based readout of anti-tumour-specific CD8* responses and may 
indicate a promising role for CD39 in monitoring immune checkpoint 
therapy. 

In summary, our results demonstrate that human CD8* TILs can not 
only be specific for tumour antigens but may also contain a population 
that recognizes cancer-unrelated epitopes, characterized by an absence 
of CD39 expression. Our data show that tumours can possess a wide 
range of frequencies and phenotypes of bystander infiltrating CD8* 
T cells. Conversely, although partially based on very limited examples 
of tumour-specific T cell populations (two neoantigen-specific popu- 
lations from two patients and two tumour-specific populations from 
an uncommon LELC tumour), our data also suggest that CD39 could 
be useful as a marker of tumour-specific CD8* T cells, which could 
be exploited for the development of novel biomarkers or therapeutics. 
Future work will determine the extent to which CD39 is a useful marker 
of tumour specificity. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Human samples. PBMCs, adjacent normal and tumour samples (lung and 
colon) were obtained from patients with colorectal cancer or lung cancer (See 
Supplementary Table 6). The use of human tissues was approved by the appropriate 
institutional research boards, A*STAR and the Singapore Immunology Network. 
Cell isolation. Samples were prepared as previously described™. In brief, tissues 
were mechanically dissociated into small pieces and incubated at 37°C for 15 to 40 
min in DMEM + collagenase IV (1 mg/ml) + DNase (15 j1g/ml). Digestion was 
stopped by addition of RPMI containing 5% FBS. Dissociated tissues were filtered 
and washed in RPMI 5% + DNase (15 j1g/ml) FBS. All samples were cryopreserved 
in 90% FBS + 10% DMSO and stored in liquid nitrogen. 

MHC monomer production and heavy metal labelling of streptavidin and anti- 
bodies. Purified antibodies lacking carrier proteins were purchased according to 
the list in Supplementary Table 7. Antibody conjugation was performed accord- 
ing to the protocol provided by Fluidigm. Streptavidin was labelled as previously 
described”. HLA-A*11:01 and HLA-A*24:02 monomer was refolded with appro- 
priate UV-cleavable peptide and biotinylated as described****. 
Tumour-associated antigen and neoantigen epitopes prediction. Binding of 
epitopes of common TAA protein*” to HLA-A*11:01 or HLA-A*24:02 was pre- 
dicted using the NetMHC3.4 algorithm**. Candidate epitopes (8 to 11 amino 
acids) with an affinity <500 nM were selected. For neoantigens, exome and RNA 
sequencing were performed at Genome institute of Singapore (GIS). Tumour and 
adjacent normal tissues were harvested by pathologists following surgical resection. 
DNA and RNA were extracted from clinical specimens using the Qiagen Allprep 
DNA/RNA kit. After the QC step, DNA was sonicated to shorter fragments using 
the Covaris system and end-repaired and ligated with sequencing adapters. Agilent 
SureSelect Human All Exon V6 was used for exome capture before sequencing. 
The Illumina Hiseq 4000 platform was used for whole exome sequencing (WES). 
For DNA sequencing, short sequence reads were mapped against human refer- 
ence genome GRCh37 using BWA-MEM with default parameters*’. Following 
GATK best practice’, PCR duplicates were first removed and subsequently rea- 
ligned and recalibrated (available at: https://github.com/gis-rpd/pipelines, GATK 
v.3.5). Somatic mutations (SNVs) were identified using MuTect (version 1.1.7)". 
Somatic insertions and deletions (indels) were called using Strelka with default 
parameters’? The raw RNA-seq data was mapped to the reference genome using 
STAR. The data were ported to RSEM to generate count data and subsequently 
normalized using DESeq2. Non-synonymous mutations and indels were used to 
predict the mutated protein sequences using ANNOVAR™. A list of peptides 8-11 
amino acids in length that are not seen in normal cells (covering either the mutated 
position or the novel C-terminal sequence due to frame-shift indels) was generated. 
Subsequently, the binding affinity of every mutant peptide and its corresponding 
wild-type peptide to the predicted HLA-A alleles was estimated using netMHC3.4. 
The putative neo-antigens were identified as the mutant peptide with a predicted 
binding strength of <500 nM and lower binding strength than that of its corre- 
sponding wild type. All peptides (predicted TAA, published TAA, neoantigens) 
were synthetized at Mimotopes and diluted in DMSO to the final concentration 
of 10 mM. 

mRNA sequencing data analysis. The paired-end RNA-seq reads from HiSeq 4000 
were mapped to the human GRCh38/HG19 reference genome using the STAR soft- 
ware tool. The mapped paired-end reads were summarized to gene level using fea- 
tureCounts v.1.5.0-p1 software tool“ and with GENCODE v26 gene annotation”. 
Genes with read count less than five in fewer than two samples in all cell popula- 
tions were filtered out from further analysis. Limma-voom* pipeline was used for 
differentially expressed gene (DEG) analysis. DEGs from comparisons between 
different cell populations were selected with Benjamini-Hochberg adjusted P value 
of <0.05. All analyses were done in R v.3.1.2*”. We ran PCA using the prcomp func- 
tion from the stats (v.3.4.2) R package, using the whole transcriptomic data. Data 
ellipses were drawn using the ordiellipse function of the vegan (v.2.4-5) R package. 
We used the HTSanalyzeR package (v.2.26.0) to run GSEA and hypergeometric 
tests on gene collections from the Gene Ontology Biological Processes database, 
filtered for gene sets with at least 20 genes present in our dataset. For GSEA we used 
1000 permutations to estimate P values and applied corrections for multiple tests 
using the Benjamini-Hochberg procedure. Results were displayed by plotting the 
40 most significant pathways using the enrichment map package. 

We performed hierarchical clustering on the Pearson correlation matrix of the 
genes whose variance was in the top 25%, using Ward’s linkage criterion and 1—r 
as the distance function, where r is a pairwise Pearson's correlation coefficient. The 
resulting dendrogram was cut into 10 subtrees to obtain gene modules. 

To compute clonality indices, we first converted the measured counts for TCR 
to frequencies. We then computed Shannon's entropy normalized for the number 
of unique pairings n by H = }j_, —log(f,) x f /log(n), where fi is the frequency 


NATUR E|www.nature.com/nature 


of the pairing n, as previously proposed“, H varies from 1 for a uniform distribu- 
tion to 0 for an entirely clonal distribution. Clonality C was thus defined as 
C=1—Hand varies from 0 to 1, with 1 indicating high clonality. 

Multiplexing tetramer preparation and staining. For multiplex MHC-tetramer 
staining, each tetramer was labelled with a combination of three metal-labelled 
streptavidins. Using ten different metal-labelled streptavidins, 120 possible com- 
binations (10 choose 3) were generated. Each specific combination was associated 
with a different peptide. Each metal-labelled streptavidin (50 j1g/ml, 20 il) was 
mixed for each combination using an automated pipetting device (TECAN). Ina 
96 well plate, 5 jl peptide (1 mM) was add to 100 pl HLA monomer (100 pg/ml, 
diluted in PBS), with a different peptide in each well. The plate was exposed to 
UV (365 nm) for 10 min for peptide exchange and left overnight at 4°C. For the 
tetramerisation, each peptide-MHC complex-metal-labelled streptavidin com- 
bination (50 j1g/ml) was added in three steps (3 x 10 11) according to the coding 
scheme. Then, tetramerized peptide- MHC complexes were incubated with free 
biotin for 10 min (10 1M) (Sigma). All different tetramers were combined and 
concentrated using a 50-kDa Amicon filter (Millipore) to a final volume of 500 il. 
Then, 500 ul PBS, 1% BSA, 0.02% sodium azide was added. Before staining, the 
tetramer cocktail was filtered using a 0.1-1m filter (Millipore). 

Frozen samples were thawed and washed in RPMI, 10% FBS, 15 ,1g/ml DNase. 
Cells were incubated for 1 h at room temperature with the tetramer cocktail. Cells 
were then stained with 5 \.M cisplatin (viability marker) in PBS for 5 min as pre- 
viously described**. Cells were then incubated with antibody cocktail for 15 min 
(See Supplementary Table 5 for clone list and metals) and fixed in 2%PFA. 

Data analysis and t-SNE. After mass cytometry (CyTOF) acquisition, which was 
performed as previously described‘*°°, any zero values were randomized using 
a uniform distribution of values between zero and —1 using an R script (as was 
the default operation of previous CyTOF software). Note also that all other inte- 
ger values measured by the mass cytometer are randomized in a similar fashion 
by default. The signal of each parameter was then normalized based on the EQ 
beads (Fluidigm) as previously described*!. Cells were manually de-barcoded 
using FlowJo (Tree Star). Samples were then used for t-SNE analysis similar to 
that previously described*?*” using custom R scripts based on the ‘flowCore’ and 
‘Rtsne’ (using CRAN R packages that perform the Barnes—Hut implementation of 
t-SNE)°°?, In R, all data were transformed using the logicleTransform function 
(flowCore package) using parameters: w= 0.25, t= 16409, m= 4.5, a=0 to roughly 
match scaling historically used in FlowJo. For heatmaps, median intensity corre- 
sponds to a logical data scale using formula previously described*‘. The colours in 
the heat map represent the measured means intensity value of a given marker in 
a given cluster. A four-colour scale is used with black—blue indicating low expres- 
sion values, green-yellow indicating intermediately expressed markers, and red 
representing highly expressed markers. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. Sequence data that support the findings of this study have been 
deposited in the Gene Expression Omnibus under accession number GSE113590. 
Mass cytometry data have been deposited in FlowRepository (https://flowrepos- 
itory.org) under accession link FR-FCM-ZYWM. Further data that support the 
findings of this study are available from the corresponding authors upon reasonable 
request. 
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Extended Data Fig. 1 | Tumour-infiltrating CD8* T cells are 
heterogeneous. a, Mass cytometry dot plots showing expression 

of markers expressed by tumour-infiltrating CD8* TILs (gated on 

CD45" cisplatin”). Data from at least 10 independent mass cytometry 
experiments. Representative data from one patient. b, For high- 
dimensional assessment of CD8* T cell heterogeneity, we used t- SNE, 
which accounts for nonlinear relationships between markers and projects 
high dimensional data into a low dimensional space by making a pairwise 
comparison of cellular phenotypes to optimally plot similar cells near 
each other (see Methods). Parallel analysis of CD8* T cells from PBMCs 
(blue), tumour-adjacent tissue (yellow) and tumour tissue (red) from lung 


> 


or colorectal cancer patients was performed, which allows for accurate 
comparison of the phenotypes of cells from each of these sample types. 
t-SNE map of CD8* T cells isolated from PBMC (blue), tumour adjacent 
tissue (lung or colon, yellow) and tumour (red). t-SNE was performed 
separately on each patient. Data are from at least 10 independent mass 
cytometry experiments. Representative data from three patients for 

each malignancy. c, t-SNE analyses focusing only on CD8* T cells from 
tumour infiltrates to explore the heterogeneity of CD8* TILs within 
individual patient tumours. t-SNE map of CD8* TILs isolated from a 
colorectal tumour. Data are from at least 10 independent mass cytometry 
experiments. Representative data from one patient. 
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Extended Data Fig. 2 | Multiplex tetramer staining by mass cytometry. 
To investigate the antigen specificity of CD8* TILs, we performed 
multiplex MHC-tetramer staining as reported previously (see Methods). 
By using a three-metal coding scheme, we encoded up to 120 different 
tetramers specific for neoantigens, tumour-associated antigens (TAA) 


and cancer-unrelated epitopes (peptide list is shown in Supplementary 
Tables 1-3). Data are from at least 10 independent mass cytometry 
experiments (See Fig. 2). Representative data from three patients. Each 


MHC tetramer” cell population is positive for a unique code, composed of 
three differently-labelled streptavidin populations. 
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LELC tumour-specific CD8* TILs. a, Immunohistochemistry of lung 
adenocarcinoma and lung from LELC, stained with haematoxylin (blue) 
and EBV-encoded small RNA in situ hybridation (EBERish, brown). 
Primary LELC is rare and often associated with EBV infection in lung 
epithelial cells. Using EBERish staining on tissue sections, we confirmed 
the presence of EBV virus in tumour cells from patient A311. Data are 
from one experiment. b, Flow dot plot representing two populations 
specific for EBV-derived peptides (parts of the BRFL1 and BMFL1 
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Extended Data Fig. 6 | Differential gene expression profiles of CD39— 
and CD39* CD8* TILs. a, In order to better characterize CD39+t CD8* 
TIL cells, we sorted and performed transcriptomic profiling on CD39 
and CD39*t CD8t TILs. Using PCA on the complete transcriptomic data 
we observed a natural ordering of samples from naive to effector memory 
PBMCs, then CD39 CD8* TILs, and finally CD39* CD8* TILs along 
the PC] axis (See Fig. 4). We then used GSEA to biologically interpret 
PC1. Among all pathways that were significantly upregulated, we found 
CD39* CD8* TILs were enriched in pathways related to cell proliferation 


and the adaptive immune response, which suggests that these cells were 
subjected to higher TCR signalling (See detailed list on Supplementary 
Table 5). b, To obtain a more comprehensive overview of the difference 
between CD39~ and CD39*+ CD8* TILs, we studied gene sets specific 

for exhaustion, a pathway characteristic of chronically stimulated T 
cells”®”°. In line with the hypothesis that CD39 marks CD8* TILs for 
chronic antigen stimulation, the gene set described for exhaustion (c) was 
significantly enriched in CD39* CD8* TILs in both colorectal and lung 
cancer (See Fig. 4). 
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Extended Data Fig. 7 | Skewed TCR repertoire between CD39 and 
CD39* CD8* TILs. To further explore the specificity of CD39* CD8T 
TILs, we performed TCRa and TCR8 sequencing of CD39~ and 
CD39* CD8* TILs. We assume that a less diverse TCRa or TCRB 
profile in CD39* CD8* TILs would suggest tumour antigen-driven 
clonal expansion, as suggested”. a, The clonality index, incorporating 
the frequency of each unique TCRa or TCR8 clone in paired samples 
(n= 8 patients), indicated a lower TCRa and TCRS diversity in CD39* 


CD8* TILs. Two-tailed paired t-test. Data are from two independent 
experiments. b, c, We also compared TCRa repertoires between these 
populations and found that the most highly represented clones were not 
shared between CD39 and CD39t CD8* TILs. Taken together, the less 
clonal and skewed TCRa profile of CD39* CD8* TILs supports the notion 
that these cells underwent tumour antigen-driven clonal expansion. 

Data are from two independent experiments. 
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Extended Data Fig. 8 | Exhausted profiles of CD39t CD8* TILs. 

a, Frequencies of the expression of activation markers (left panel) and 
inhibitory markers (right panel) by CD39” (blue) and CD39* (red) 
CD8* TILs in lung tumours. Data are from at least ten independent 
mass cytometry experiments. Data are means + s.d. Two-tailed paired 
t-test (n = 12 to n= 30 patients). b, f- SNE map of CD39* CD8* TILs 
cells isolated from a colorectal tumour. t-SNE was performed on data 
from one patient. Despite the phenotypic differences between CD39* 
and CD39~ CD8* TILs, using t-SNE we observed that the CD39t CD8* 
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TILs were heterogeneous and we could not describe any simple rules 
related to the hierarchical expression of various co-stimulatory receptors, 
inhibitory receptors or proliferation markers c, Mass cytometry dot 

plots representing expression of IFNy, TNFa and IL-2 by CD8* TILs 
plotted against CD39 expression (representative data from one patient 
with a colorectal tumour). Data from two independent experiments. 


d, Frequency of the expression of IFNy, TNFa and IL-2 by CD39™ (blue) 
and CD39* (red) CD8* TILs from colorectal tumour. n= 11 patients, data 
from two independent experiments. Two-tailed paired t-test. 
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Extended Data Fig. 9 | Relationships between frequencies of CD39* 
CD8* TILs and clinical parameters of colorectal cancer patients. 

a, Mass cytometry dot plots representing expression of CD39 by 
CD8* TILs in lung and colorectal tumours. Representative data from 
two patients. Data are from at least ten independent mass cytometry 
experiments. b, CD39* cells as a percentage of CD8* TILs stratified 
by microsatellite-stable (MSS) (1 = 3 patients) or microsatellite- 
instable (MSI) (n= 33 patients) status. Data are means from at least 
ten independent mass cytometry experiments. c, Mutation rate (in 
mutational events per megabase, plotted on a log scale) versus CD39* 
CD8* TIL frequencies in colorectal tumours (n = 26 patients). Data 
from at least ten independent mass cytometry experiments. d, Box plots 
representing CD39* frequencies among CD8* TILs stratified by the 


mS CD39* CD8 TIL cells (%) 
consensus molecular subtypes of each tumour. CMS1 (n =6 patients), 
CMS2 (n = 34 patients), CMS3 (n= 8 patients), CMS4 (n =3 patients). 
Box plots show the median, box edges represent the first and third 
quartiles, and the whiskers extend from minimum to maximum. Data are 
from at least ten independent mass cytometry experiments. e, Dot plots 
representing CD39* frequencies among CD8* TILs stratified by tumour 
stages. Stage I (n =6 patients), stage II (n = 10 patients), stage III (n = 10 
patients), stage IV (n =8 patients). Data are mean + s.d. from at least ten 
independent mass cytometry experiments. f, Number of driver mutations 
against CD39+ CD8* TILs frequencies in colorectal tumours. Two-tailed 
t-test; Pearson’s correlation. n = 27 patients. Data are from at least ten 
independent mass cytometry experiments. 
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Extended Data Fig. 10 | Gene set enrichment in tumours with high 
CD39+t CD8* TIL count. We investigated transcriptomic profiles of whole 
tumours using bulk RNA sequencing in conjunction with the percentage 
CD39 expression in CD8* TILs as measured by mass cytometry. 

Among the 25% most varying genes, we identified ten gene modules by 
performing hierarchical clustering on the Pearson correlation matrix of the 
genes. Notably, a cluster whose expression correlated with the frequency of 
CD39* TILs was enriched in genes related to ‘adaptive immune response, 


ositive regulation of 
ignaling pathway, planar 


Adjusted p-values 


—_ 
0 0.05 1 


‘T cell receptor signalling pathway’ and ‘interferon-gamma mediated 
signalling pathway’ (see also Supplementary Table 4). Pathways related to 
peptide presentation by MHC molecules were also overrepresented in this 
cluster, which contained genes such as class I MHC molecules, TAP1 and 
TAP2 molecules and proteasome-related genes. n = 46 patients. Data are 
from at least five independent experiments. 

Two-sided hypergeometric test. 
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Microbial signals drive pre-leukaemic 
myeloproliferation in a Tet2-deficient host 
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Somatic mutations in tet methylcytosine dioxygenase 2 (TET2), which 
encodes an epigenetic modifier enzyme, drive the development of 
haematopoietic malignancies'~’. In both humans and mice, TET2 
deficiency leads to increased self-renewal of haematopoietic stem 
cells with a net developmental bias towards the myeloid lineage!**”. 
However, pre-leukaemic myeloproliferation (PMP) occurs in only a 
fraction of Tet2—/— mice®® and humans with TET2 mutations!**~’, 
suggesting that extrinsic non-cell-autonomous factors are required 
for disease onset. Here we show that bacterial translocation and 
increased interleukin-6 production, resulting from dysfunction 
of the small-intestinal barrier, are critical for the development of 
PMP in mice that lack Tet2 expression in haematopoietic cells. 
Furthermore, in symptom-free Tet2—'— mice, PMP can be induced 
by disrupting intestinal barrier integrity, or in response to systemic 
bacterial stimuli such as the toll-like receptor 2 agonist. PMP was 
reversed by antibiotic treatment and failed to develop in germ-free 
Tet2—'— mice, which illustrates the importance of microbial signals 
in the development of this condition. Our findings demonstrate 
the requirement for microbial-dependent inflammation in the 
development of PMP and provide a mechanistic basis for the 
variation in PMP penetrance observed in Tet2~/— mice. This study 
will prompt new lines of investigation that may profoundly affect 
the prevention and management of haematopoietic malignancies. 
TET2 deficiency leads to severe myeloproliferation, extramedullary 
haematopoiesis and splenomegaly that mimic pre-leukaemic myelo- 
proliferative disorders (henceforth referred to as PMP)””-°. PMP is 
thought to set the stage for the development of overt leukaemia after 
the occurrence of additional cooperative oncogenic mutations'®!!. To 
better understand the pathogenesis of PMP, we set out to determine 
the mechanisms that underlie variations in PMP penetrance, which are 
observed in around 50-75% of aged Tet2~'~ mice that are over 20 weeks 
old®? (Extended Data Fig. 1a). We first developed a peripheral-blood 
biomarker that could serve as a robust predictor of disease. We found 
that the frequency of CD11b*Gr1* myeloid cells in the peripheral 
blood positively correlates with myeloid proliferations!” (as defined 
by the expansion of myeloid cells in the spleen and peripheral blood) 
and splenic extramedullary haematopoiesis markers®?:3! (an increase 
of lineage Scal tc-Kitt (LSK) cells and splenomegaly) (Extended Data 
Fig. 1b-e). Confirming the prognostic robustness of this newly defined 
peripheral-blood biomarker of PMP, Tet2~/~ mice that were over 20 
weeks old and had >16% of CD11b*Gr1* myeloid cells in their periph- 
eral blood displayed a PMP phenotype (Extended Data Fig. 1f-h) as 


MLN Spleen 
Tet2"* Teta’ = Tet2* = Teta 


a. 


Anaerobic 


Average number of colonies 
per 10,000 cells 


a. ° 
! oy ol 
T 2 
Plating 45 12345 12345 2 
Symptom PMP 
b free e Anaerobic 
i si ‘ii MIN. __ Spleen __ 
104 104 104 10° 15/16 8/16 
< 
2S soc os © a 
a sia ®aQ es 3 
a8 2 2810 P F816 ae 10° eo 5 
85 1! a 3910 eget . 5. aie 2 
33 2», 88 see og ig 2 sage oye 
85 me ei!) oo a3 102 a J 
Be oo ie 8 oe SE aan gee de 510° 
7 ? s02] QS ast BP ns =e og a 
raat Ep ee oe Be So 
10! r 40! 10° 
c HEWT germ free Tet2+ (ij Tet2- ; 
f Aerobic 
+ 10’q _ 40 MLN Spleen 
S 4 ——— eo. Stee 
9 = 8 108 12/22 10/22 
TE 108! eee” 28S ag SO 9 3 
a5 ry e + e 3 
oo 4 ) = 20 @ &@ = 
$2 S @ ® 
8 1054 I a e D 
2 = 5 
E @ -=0.504 a Me a 
2 F P=0009 © P= 0.043 2 
10 7 a — steel a 
401 102, 108 = 104 101 102,108 =~ 404 


16S copies per 50 1! of blood 


Fig. 1 | Tet2 deficiency leads to systemic bacterial dissemination. 

a, In vitro HSC self-renewal colony-forming assay of haematopoietic 
progenitors (n = 3 mice). Mean +s.e.m. b, Bacterial 16S gene copies in 
peripheral blood (left; n = 4, 10 and 37 for germ-free wild-type (WT), 
Tet2*'* and Tet2~'~ mice, respectively), MLN (middle; n=4, 11 and 

15 for germ-free wild-type, Tet2*/*+ and Tet2~'~ mice, respectively) and 
spleen (right; n = 4, 11 and 19 for germ-free wild-type, Tet2t/* and 
Tet2~'~ mice, respectively). Germ-free wild-type mice served as negative 
control. Centre at median, two-tailed Mann-Whitney U-test. Bacterial 16S 
rRNA gene qgPCRs from about 30 mg MLN and about 30 mg spleen were 
normalized to the host murine Ifnb1 gene. c, Correlation between 16S gene 
copies in the peripheral blood and numbers (left) or frequency (right) 

of peripheral-blood CD11b*Gr1* myeloid cells (n = 40 mice). Pearson 
correlation test. d, Representative image of aerobic and anaerobic cultures. 
e, f, Quantification of bacteria colony-forming units (CFUs) of MLN and 
spleen suspensions grown under anaerobic (e) or aerobic (f) conditions. 

In b, *P< 0.05, **P< 0.01, ***P < 0.001, ****P < 0.0001. Data are 
representative of at least three independent experiments. 
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Fig. 2 | Disruption of intestinal barrier integrity and systemic microbial 
signals are sufficient to drive PMP in the absence of Tet2. a, Intestinal 
permeability measurement by blood plasma FITC-dextran concentrations 
(n= 13 Tet2*!* mice or 12 Tet2~/~ mice). Centre is mean, two-tailed 
unpaired t-test. b, c, Blood plasma FITC-dextran concentration correlates 
with spleen weight (b) and numbers of splenic CD11b*Gr1* myeloid cells 
(c) (a= 12 mice). Pearson correlation test. d, e, Symptom-free Tetay 
Vav mice that are over 20 weeks old, and littermate controls, treated with 
DSS (+) or received water (—). d, Frequency (left) and numbers (right) of 
CD11b*Gr1* myeloid cells before, during and after DSS treatment 

(n=5 Tet2"cre— (Tet2“ mice without the Cre construct) mice or 8 Tet2/ 
Vav" mice). Mean + s.e.m., two-tailed unpaired t-test. e, Numbers of 
CD11b*Gr1* myeloid cells (left), LSK cells (middle) and spleen weight 
(right) at end point analysis (n=5 (Tet2“cre-, no DSS), 4 (Tet2“Vav, 

no DSS), 6 (Tet2“cre~, DSS) or 8 (Tet2/Vav"", DSS) mice). f, Oligotypes 


compared to age-matched Tet2~/~ mice with <16% peripheral-blood 
CD11b*Grl* cells (defined as ‘symptom fre@), or littermate controls. 

We next determined whether differences in PMP penetrance in 
Tet2~'~ mice (Extended Data Fig. la—h) could be linked to a change in 
the haematopoietic stem cell (HSC) self-renewing potential in vitro®”. 
Irrespective of health status, bone marrow and splenic haematopoietic 
progenitors from symptom-free Tet2~/~ mice and Tet2~'~ mice with 
PMP displayed similar re-plating efficiencies, which were superior to 
those of littermate controls (Fig. 1a and Extended Data Fig. 1i). This 
suggests that extrinsic non-cell-autonomous factors are required for 
the development of PMP in Tet2~/~ mice. 

This finding, together with previous studies that report that the sys- 
temic dissemination of bacteria induces extramedullary haematopoie- 
sis and emergency myelopoiesis’®, prompted us to investigate whether 
bacterial triggers could drive myeloproliferation in the absence of Tet2. 
Quantification of the bacterial 16S rRNA gene revealed the presence of 
16S gene copies in the peripheral blood, the mesenteric lymph nodes 
(MLNs), and the spleen of Tet2~'~ mice (Fig. 1b) that correlated with 
myelocytosis (Fig. 1c and Extended Data Fig. 1j, k). Furthermore, cul- 
tivation of MLN and spleen suspensions under aerobic and anaerobic 
culture conditions confirmed that in about 50% of Tet2~'~ mice, live 
bacteria—in particular Lactobacillus—were present. Lactobacillus has 
previously been reported to be predominantly located in the small 
intestine under steady-state conditions’® (Fig. 1d-f, and Extended Data 
Fig. 11). No bacterial dissemination was observed in littermate controls 
(Fig. 1d-f). 


NATUR E|www.nature.com/nature 


identified in MLNs and spleens of Tet2~/~ mice are enriched in the 
jejunum. Centre is median (n =5 (Tet2*'*) or 7 (Tet2~/~) mice for 
jejunum and colon). g, h, Symptom-free Tet2//Vav that are over 20 weeks 
old, and littermate controls, treated with the TLR1/2 agonist Pam3CSK4 
(+) or received PBS (—). g, Percentage (left) and numbers (right) of 
CD11b*Gr1* myeloid cells prior (day 0), during treatment (day 2) and at 
end-point analysis (day 14) (n =6 mice). Mean +s.e.m., two-tailed paired 
t-test, Sidak’s post hoc test. h, Numbers of CD11b*Gr1* myeloid cells 
(left), LSK cells (middle) and spleen weight (right) (n=5 (Tet2“cre-, 

no Pam3CSK4), 6 (Tet2“cre~, with Pam3CSK4), 7 (Tet2/Vav, 

no Pam3CSK4) or 6 (Tet2/Vav, with Pam3CSK4) mice). Centre is 
mean, one-way ANOVA, Sidak’s post hoc test. In e, f, centre is median, 
Kruskal-Wallis, Dunn’s post hoc test. Data are representative of at least 
two independent experiments. *P < 0.05, **P< 0.01, ***P< 0.001, 
KEP < 2.0001. 


The existence of spontaneous bacterial translocation in Tet2~/~ mice 
was unexpected, given the absence of intestinal anomalies (Extended 
Data Fig. 2a) and epithelial cell death (Extended Data Fig. 2b, c) in 
Tet2—'— mice. However, increased intestinal permeability can occur 
in mice and humans with normal intestinal architecture’’. Consistent 
with the observed microbial translocation, Tet2~'~ mice displayed a 
significant increase in intestinal permeability as assessed by in vivo 
FITC-dextran intestinal permeability assay (Fig. 2a). Increased intes- 
tinal permeability positively correlated with PMP severity in Tet2~'~ 
mice (Fig. 2b, c and Extended Data Fig. 3a), suggesting that disruption 
of intestinal barrier integrity may be sufficient to initiate PMP in 
symptom-free Tet2~'~ mice. Accordingly, administration of dextran 
sodium sulfate (DSS)—a compound known to alter intestinal barrier 
function'®’—caused excessive myeloproliferation and extramedullary 
haematopoiesis that persisted more than one month after cessation 
of treatment in symptom-free Tet2’FVay" mice, but not in littermate 
controls (Fig. 2d, e and Extended Data Fig. 3b-d). Tet2//Vav" mice 
(Tet2 deleted in haematopoietic cells including progenitors) were used. 
because somatic TET2 mutations occur in the haematopoietic compart- 
ment in humans'~’. These results suggest that maintaining intestinal 
barrier function is important for preventing PMP in the context of Tet2 
deficiency in haematopoietic cells. 

Whole transcriptome sequencing analysis of the intestine of 
Tet2~'~ and littermate control mice revealed major transcriptional 
alterations in the jejunum, but minor changes were also detected in 
the colon (Extended Data Fig. 4a). This finding is consistent with the 
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Fig. 3 | Microbial signals are required for PMP in Tet2~'~ mice. 

a, b, Germ-free (GE), SPF-housed (SPF) Tet2~/~ mice that are over 

40 weeks old, and littermate controls, analysed for percentage (a, left) and 
numbers (a, right) of CD11b*Gr1* myeloid cells (n= 6 (GF Tet2t/*), 

7 (GF Tet2~'~) or 7 (SPF Tet2~'~) mice), and numbers of CD11b*Gr1* 
myeloid cells (b, left), LSK cells (b, middle), and spleen weight (b, right), 
(n=7 (GE Tet2+/+), 4 (GF Tet2-'-) or 5 (SPF Tet2~'—) mice). c, Schematic 
for antibiotics (ABX) treatment regimen. Longitudinal analysis described 
in f, g and Extended Data Fig. 8d; prevention described in Extended 

Data Fig. 8c; and reversal described in d, e and Extended Data Fig. 9a-j. 
EPA, end point analysis. d, e, Tet2~'~ mice with PMP that are over 20 
weeks old, and littermate controls, analysed for percentage (d, left) and 
numbers (d, right) of CD11b*Gr1* myeloid cells (n = 13 mice). Lines 
connect values obtained from the same mouse sampled before (—) 

and after (+) ABX treatment. Two-tailed paired t-test. e, Numbers of 


observation that Lactobacillus spp. residing in the small intestine!® were 
predominantly found in peripheral organs of Tet2~'~ mice. Consistent 
with Tet2-deficient mice displaying increased intestinal permeability 
(Fig. 2a), the expression of genes known to maintain intestinal immune 
homeostasis and barrier function was altered in the jejunum (Extended 
Data Fig. 4b). In particular, tight junction zonula occludens-1 (ZO-1), 
which is critical in regulating the paracellular leakage pathway’, was 
reduced in the jejunum but not in the colon of Tet2-deficient mice 
(Extended Data Fig. 4c-d). In contrast to previous observations that 
suggested that selective deletion of Tet1 in epithelial cells leads to 
defects in intestinal permeability”, Tet2“Vav’ mice—but not Tet2/ 
Villin™ mice (Tet2 deleted in epithelial cells)—displayed defects in 
gut barrier function (Extended Data Fig. 5a-c). Furthermore, Tet2// 
LysM“™ (LysM is also known as Lyz2) mice (Tet2 deleted in mature 
myeloid cells) failed to show an increase in intestinal permeability 
(Extended Data Fig. 5a—c). Consistent with these findings, and as pre- 
viously shown®”!, PMP developed only in Tet2/Vav"" mice (Extended 
Data Fig. 5d-f). These observations suggest that Tet2 deficiency in the 
haematopoietic compartment can promote small-intestinal barrier 
dysfunction that causes bacterial translocation. 

To further assess whether the translocated bacteria were primarily 
of jejunal origin, we matched oligotypes”’ of identified bacterial 
strains in the spleen and MLN of Tet2-/~ mice with bacteria residing 
in the different intestinal compartments (Extended Data Fig. 1] and 
Supplementary Table 4). Consistent with the proposed hypothesis, 
Lactobacillus reuteri #5529 (number refers to oligotype), Lactobacillus 
johnsonii #5433 and Lactobacillus intestinalis #0092 were substantially 


CD11b*Gr1* myeloid cells (left; n = 20 (Tet2*/*, no ABX treatment), 20 
(Tet2~'~, no ABX treatment), 6 (Tet2*/*, with ABX treatment) and 13 
(Tet2~'~, with ABX treatment) mice), LSK cells (middle; n= 11 (Tet2+’*, 
no ABX treatment), 13 (Tet2~/~, no ABX treatment), 6 (Tet2+/*, with 
ABX treatment) and 12 (Tet2~'~, with ABX treatment) mice) and spleen 
weight (right; n = 14 (Tet2*/*, no ABX treatment), 15 (Tet2~'~, no ABX 
treatment), 6 (Tet2+’*, with ABX treatment) and 14 (Tet2~/~, with ABX 
treatment) mice). f, g, Tet2~/~ mice analysed for frequency (f, left) and 
number (f, right) of CD11b+Gr1* myeloid cells and (g) 16S gene copies 
before, during and after ABX treatment (n =7 mice). Mean+s.e.m., 
repeated measures one-way ANOVA, Sidak’s post hoc test. a, b, e, Centre 
is mean, one-way ANOVA, Sidak’s post hoc test. Data are representative of 
at least two independent experiments; *P < 0.05, **P < 0.01, ***P < 0.001, 
#EEE P< 0.0001. 


enriched in the jejunum independent of the genotype, when compared 
to the colon (Fig. 2f). 

We first assessed whether TLR2 agonists—cell wall components 
of several Lactobacillus strains**—are relevant bacterial signals in 
Tet2~/~ mice (Extended Data Fig. 6a), and then showed that the 
TLR2 agonist Pam3CSK4 was sufficient to induce extensive myelo- 
proliferation in symptom-free Tet2/fVav" mice but not in littermate 
controls (Fig. 2g-h and Extended Data Fig. 6b-d), without affecting 
intestinal barrier integrity (Extended Data Fig. 6e). Of note, L. reuteri 
#5529 and L. johnsonii #5433 are also found in the small intestine of 
littermate control mice (Fig. 2f) and 16S microbiome profiling failed 
to detect significant differences in microbial structures between 
genotypes (Extended Data Fig. 7a—d). Furthermore, variations in 
disease penetrance were observed among cohoused Tet2/Vav mice 
(Extended Data Fig. 7e). These observations indicate that changes in 
microbial structures do not drive PMP and that systemic microbial 
signals, such as TLR2 agonists, may be sufficient to promote PMP 
independently of intestinal barrier dysfunction in Tet2-deficient mice. 
This is consistent with previous epidemiological studies that report an 
association between clonal expansion and myeloid malignancies with 
antecedent chronic inflammatory conditions of infectious origin”. 

To demonstrate that microbial signals are required for the devel- 
opment of PMP, we raised Tet2~/~ mice under germ-free conditions. 
Forty-week-old Tet2~'~ germ-free mice failed to show signs of PMP in 
the peripheral blood and the spleen, in contrast to age-matched specif- 
ic-pathogen-free (SPF)-raised Tet2-'~ mice (Fig. 3a, b and Extended 
Data Fig. 8a, b). Furthermore, treatment with antibiotics (Fig. 3c) both 
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Fig. 4 | Bacteria-induced IL-6 is required for PMP in Tet2—'~ mice. 

a, IL-6 cytokine levels in blood plasma (n =7 (Tet2t/*, SPF), 26 (Tet2-/~, 
SPE), 6 (Tet2*!*, GF) or 7 (Tet2~/~, GF) mice). Centre is median, Kruskal- 
Wallis, Dunn's post hoc test. b, IL-6 cytokine levels in blood plasma 
correlate with the frequency (left) and numbers (right) of CD11b*Gr1* 
myeloid cells (n = 49 mice). Pearson correlation test. c, In vitro HSC self- 
renewal colony-forming assay of haematopoietic progenitor of the spleen 
from Tet2~/~ mice (red lines) and littermate controls (blue lines) in the 
presence of anti-IL-6 antibody (anti-IL-6) or isotype control (ISO) after 
the first replating (n =3 mice). d, Representative histograms, percentages 
and quantification of mean fluorescence intensity (MFI) of IL-6Ratc- 
Kit*Sca-17 (lineage~c-Kitt (LK) gated) CD34*FcyRIII/U* (FcyRII 

and FcyRII are also known as CD16 and CD32, respectively) GMPs 

and IL-6Ra* LK gated CD34~ FcyRIII/II~ megakaryocyte-erythroid 
progenitors (MEP) from the spleen (n=5 (Tet2*!*, SPF), 6 (Tet2~'~, 
SPF), 4 (Tet2+/*, GF) or 3 (Tet2~'~, GF) mice). e, Stat3 phosphorylation 
(pY705) after 30-min stimulation with IL-6 in splenic c-Kit*Sca-17 


prevented (Extended Data Fig. 8c) and reversed PMP (Fig. 3d, e and 
Extended Data Fig. 9a-j) in Tet2~/~ mice. The longitudinal study that 
was performed in the presence of antibiotics and after antibiotic with- 
drawal (Fig. 3c), using the peripheral-blood PMP marker (Fig. 3f, g and 
Extended Data Fig. 8d), indicated that the potential for PMP remains 
and is directly linked to the presence of bacterial signals. 

Interleukin 6 (IL-6) is a critical activator of myelopoiesis in response 
to systemic bacterial dissemination'* and can be upregulated in chronic 
myeloproliferative disease in humans”. Accordingly, IL-6 was signif- 
icantly increased in both the plasma and spleen of Tet2-/~ mice with 
PMP, as compared to littermate controls (Fig. 4a and Extended Data 
Fig. 10a). The increase in IL-6 was microbiota-dependent and induced 
upon DSS and TLR2-agonist treatment in symptom-free Tet2-deficient 
mice that subsequently developed PMP (Fig. 4a and Extended Data 
Fig. 10a-d). Furthermore, peripheral-blood myeloid cell expansion 
correlated with IL-6 levels in plasma (Fig. 4b and Extended Data 
Fig. 10e). We next investigated the role of IL-6 in PMP in the context 
of Tet2 deficiency in vitro and in vivo. Of note, IL-6 is used in in vitro 
HSC self-renewing assays (Fig. 1a and Extended Data Fig. 1i’°). The 
neutralization of IL-6 significantly reduced the increased self-renewing 
capacity of Tet2~/~ haematopoietic progenitors in vitro, which indicates 
a critical role for IL-6 in PMP, but IL-6 neutralization did not affect 
the limited self-renewing capacity of Tet2-sufficient haematopoietic 
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(LK gated) CD34* (solid lines) and CD34~ (dotted lines) myeloid 
progenitors (n =5 (blue line), 6 (red line) and 4 mice (dotted lines)). 

f, Numbers of IL-6Ra* GMP and megakaryocyte-erythroid progenitors 
from the spleen (n=5 (Tet2*!*, SPF), 6 (Tet2~/~, SPF), 4 (Tet2*/+, GF) or 
3 (Tet2~'-, GF) mice). g, Schematic for anti-IL-6 treatment of Tet2~'~ mice 
with PMP that are over 20 weeks old, and littermate controls. See data 

in h, i and Extended Data Fig. 100, p. h, Percentage (left) and numbers 
(right) of CD11b*Gr1* myeloid cells at day 0 and day 21 (n=6 mice). 
Lines connect values obtained from the same mouse sampled before 

and after anti-IL-6 treatment. Two-tailed paired t-test. i, Numbers of 
CD11b*Gr1* myeloid cells (left), LSK cells (middle) and spleen weight 
(right) (n= 11 (Tet2*’*, with anti-IL-6), 6 (Tet2~/~, with anti-IL-6) or 9 
(Tet2~'~, without anti-IL-6) mice). c, e, Mean +s.e.m., two-way ANOVA, 
Sidak’s post hoc test. d, f, i, Centre is mean, one-way ANOVA, Sidak’s post 
hoc test. Data are representative of at least two independent experiments. 
*P< 0.05, **P< 0.01, ***P< 0.001, ****P< 0.0001. 


progenitors (Fig. 4c and Extended Data Fig. 10f, g). Consistent with 
this finding, IL-6Ra surface expression was significantly increased on 
Tet2~'~ granulocyte-macrophage progenitors (GMPs) as compared 
to Tet2*/* GMPs (Fig. 4d and Extended Data Fig. 10h) and associated 
with an increased sensitivity of Tet2-/~ GMPs to IL-6 as assessed by 
Stat3 phosphorylation (Fig. 4e and Extended Data Fig. 10i). Although 
an increase in IL-6Ra expression was observed in germ-free Tet2~/~ 
GMPs, the cellular expansion of IL-6Rat GMPs occurred only in SPF- 
housed Tet2~'~ mice (Fig. 4f). This suggests that the increase in IL-6Ra 
expression is intrinsic to the cell and independent of microbial signals, 
whereas the expansion of IL-6Ra* GMPs requires the microbiota and 
IL-6 (Extended Data Fig. 8b). Importantly, megakaryocyte—erythroid 
progenitors did not display increased IL-6Ra expression and signalling 
(Fig. 4d, e) and consequently failed to expand in Tet2~'~ mice under 
SPF conditions (Fig. 4f). These data suggest that the development of 
PMP requires an exogenous microbial signal and an upregulation of 
IL-6Ra on GMPs that is driven in a cell-autonomous manner by Tet2 
deficiency. Of note, increased IL-6 production in the plasma and spleen 
(Fig. 4a and Extended Data Fig. 10a) of Tet2~/~ mice correlated with 
an increase in the number of myeloid cells capable of producing IL-6 
(Extended Data Fig. 10j), rather than an increase in IL-6 production 
by mature myeloid cells (Extended Data Fig. 10k, 1). No increase in 
IL-6Ro expression in mature splenic Tet2~/~" CD11b+Gr1* myeloid 
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cells was observed (Extended Data Fig. 10m), which suggests that IL-6- 
mediated PMP in Tet2-deficient mice is determined at the myeloid 
progenitor stage. Consistent with this observation, Tet2//Vav" but 
not Tet2/LysM* mice developed PMP (Extended Data Figs. 5d-f, 
10n). Finally, in vivo IL-6 neutralization experiments established that 
IL-6 was required for PMP to develop in Tet2~'~ mice (Fig. 4g-i and 
Extended Data Fig. 100) independently of barrier function restoration 
(Extended Data Fig. 10p, q). These findings provide experimental sup- 
port to studies in humans that suggest a role for IL-6 in the pathogenesis 
of haematological malignancies”. 

Our study demonstrates a critical role for microbial-mediated 
inflammatory (IL-6) signals in the development of PMP in the context 
of Tet2 deficiency. More specifically, these data suggest an IL-6 depend- 
ent cycle that is engaged upon bacterial translocation and leads to PMP 
in Tet2~'~ mice through the expansion of GMPs that express high levels 
of IL-6Ra in the absence of Tet2 (Extended Data Fig. 10r for a sche- 
matic of this model). The mechanisms through which Tet2 deficiency 
in haematopoietic cells leads to a microbiota-dependent impairment 
of gut barrier function remain to be addressed. 

This work raises the question of whether microbial-dependent inflam- 
matory mediators, such as IL-6, are critical contributors to myeloid 
malignancies in humans by promoting PMP in patients with somatic 
TET2 mutations. Furthermore, our findings—in combination with a 
report indicating that the restoration of Tet2 expression after PMP has 
developed prevents secondary evolution towards leukaemia in Tet2~/~ 
mice**—suggest that blocking inflammatory bacterial signals in patients 
with TET2 deficiency and PMP may reduce the risk of progression to 
haematopoietic malignancies. Whether bacterial infections and events 
promoting the disruption of intestinal barrier function can create a 
permissive environment that evokes the acquisition of cooperative onco- 
genic mutations that lead to the development of leukaemia!” remains to 
be determined. Irrespectively, our study will motivate a new line of inves- 
tigations and the design of novel therapeutic strategies that target IL-6 
in patients with PMP linked to somatic TET2 mutations to revert mye- 
loproliferation and prevent the development of myeloid malignancies. 


Online content 

Any Methods, including any statements of data availability and Nature Research report- 
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METHODS 

Mice. Tet2~/~ mice have previously been described®. Tet2/ (B6;129S-Tet2""! "aai/y) 
mice were crossed with the haematopoietic-specific Vav-cre line (B6.Cg-Tg(Vav1- 
icre)A2Kio/J), the villus and crypt small and large intestinal epithelial cell-specific 
Villin-cre line (B6.Cg-Tg(Vill-cre)997Gum/J), and the myeloid cell lineage-specific 
LysM-cre line (Lyz2tm1(cre)Ifo)), respectively. For the experiments, >20-week-old 
Tet2~'~ mice and littermate controls were used, unless indicated otherwise. Both 
female and male mice were used for experiments; no notable sex-dependent differ- 
ences were found for the reported experiments. Mice were housed at the University 
of Chicago animal facilities under specific pathogen-free (SPF) conditions, where 
cages were changed on a weekly basis; ventilated cages, bedding, food and water 
(non-acidified) were autoclaved before use, ambient temperature maintained at 
23°C, and 5% Clidox-S was used as a disinfectant. Experimental breeding cages 
were randomly housed on two different racks in the vivarium, and all cages were 
kept on automatic 12-h light/dark cycles. Germ-free Tet2~/~ mice were gener- 
ated by two-stage embryo transfer, as previously described”, bred to germ-free 
C57BL/6 wild-type mice to generate littermate controls and maintained in flexi- 
ble film isolators in McMaster’s Axenic Gnotobiotic Unit. Animal husbandry for 
both SPF and germ-free facilities, and experimental procedures, were performed 
in accordance with Public Health Service policy and approved by the University 
of Chicago Institutional Animal Care and Use Committees and the McMaster 
University Animal Care Committee. 

Gnotobiotic animal husbandry. Food, bedding and water (non-acidified) were 
autoclaved before transfer into the sterile isolators. Cages within isolators were 
changed weekly, and all the cages in the vivarium were kept on 12-h light/dark 
cycles. Microbiology testing of faecal (experimental mice) or of caecum samples 
(sentinel mice; aerobic and anaerobic culture, 16S qPCR) was performed every 
other week to confirm germ-free status. 

In vivo anti-IL-6 antibody treatment. Peripheral blood of >20-week-old Tet2-/~ 
mice and littermate controls was analysed for myelopoiesis by flow cytometry. 
Tet2~'~ mice selected with increased myelopoiesis (>16% CD11b*Grl*) in 
peripheral blood, and littermate controls, were subjected to weekly intraperitoneal 
injections of 1 mg anti-mouse IL-6 (MP5-20F3, Bio X Cell, BE0046) or 1 mg rat IgG 
isotype control (HRPN, Bio X Cell, BE0088), respectively. Two Tet2~'~ mice that 
did not respond to the anti-mouse IL-6 treatment (based on blood screening) were 
excluded from the study. After three weeks, mice were euthanized and analysed 
for signs of PMP in the peripheral blood and spleen. 

In vivo Pam3CSK4 treatment. Peripheral blood of >20-week-old Tet2“Vav" 
mice and littermate controls was analysed for myelopoiesis by flow cytometry. 
Mice selected with no signs of myelopoiesis (<16% CD11b+Gr1*) in peripheral 
blood and <30g body weight were injected intraperitoneally with 100 1g TLR1/2 
agonist Pam3CSK4 (InvivoGen, tlrl-pms) every four days. On day 16, mice were 
euthanized and analysed for signs of PMP in the peripheral blood and spleen. 
DSS-induced colitis model. Peripheral blood of >20-week-old Tet2Vav" mice 
and littermate controls was analysed for myelopoiesis by flow cytometry. Mice 
selected with no signs of myelopoiesis (<16% CD11b*Gr1"*) in peripheral blood 
and <30g body weight underwent DSS colitis treatment as described”. In brief, 
mice received 3% DSS (MW: 36,000-50,000; MP Biomedicals) (w/v) in drinking 
water for 7 days and were then switched to regular drinking water. Weight was 
recorded every other day throughout the study. Mice were euthanized 30 days 
post-DSS feeding and analysed for PMP. 

Antibiotic treatment. Mice were treated with an antibiotic cocktail as previously 
described. All antibiotics used are listed as follows (concentration used, company, 
catalogue number). In detail, in the first week mice received a daily intragastric 
gavage with 100 ul of a mixture of kanamycin (4mg ml ', Sigma-Aldrich, 60615), 
gentamicin (0.35 mg ml}, Sigma-Aldrich, G1914), colistin (8500 U ml}, Sigma- 
Aldrich, C4461), metronidazole (2.15 mg ml 1. Sigma-Aldrich, M3761), and van- 
comycin (0.45 mg ml”, Sigma-Aldrich, V2002). For the following three weeks 
antibiotics were administered in the autoclaved (non-acidified) drinking water 
at 50-fold dilution except for vancomycin, which was maintained at 0.5mg ml~!. 
Autoclaved (non-acidified) water was used to dissolve antibiotics. Antibiotic water 
was prepared fresh and replaced weekly. 

In vivo intestinal permeability measurement. An in vivo assay to determine 
intestinal epithelial barrier permeability was performed using a FITC-labelled dex- 
tran method as previously described. In detail, mice were withdrawn from both 
food and water for 4h, weighed and then received 60 mg per 100g body weight 
of a freshly prepared FITC-dextran (MW 4,000, Sigma-Aldrich, 46944) solution 
diluted to 60mg ml! in sterile PBS by oral gavage. Blood was collected by cheek 
bleeding (~300-500 11) 3h post-gavage and plasma was collected by centrifugation 
of blood samples at 2,000g for 10 min at 4°C. Fifty microlitres of blood plasma was 
transferred (in duplicate) into a flat-bottom 96-well plate (Corning, 3370) and 
analysis of FITC-dextran concentration was performed with a fluorescence spec- 
trophotometer setup (Synergy, BioTek) with emission and excitation wavelengths 
of 520nm and 490 nm, respectively. Plasma samples were protected from light at 


all times. FITC-dextran concentration was determined from a standard curve 
generated by serial dilution of FITC-dextran. 

Tissue collection and cell purification. Spleen, MLNs and bones (tibia and femur, 
both sides) were collected under a laminar flow hood with autoclaved tools under 
sterile conditions and weight of spleen was recorded. Peripheral blood was col- 
lected by cheek bleeding (after sterilizing cheeks with 70% ethanol wipes) into 
EDTA-coated (Thermo Fisher Scientific, MT-46034CI) 1.5-ml tubes. To obtain a 
single cell suspension, spleen and MLNs were mashed through a 70-{1m cell strainer 
and bone marrow was flushed out with syringes filled with PBS and also mashed 
through a 70-j1m cell strainer. Erylysis of 10,11 peripheral blood, spleen and bone 
marrow was performed using the Mouse Erythrocyte Lysing Kit (R&D Systems, 
WL2000). For analysis of splenic and bone marrow haematopoietic precursors 
the mouse Lineage Cell Depletion Kit (Miltenyi Biotec, 130-090-858) was used. 
Antibodies and flow cytometry. Single cell suspensions were pelleted and 
resuspended in FACS buffer (PBS, 2% FCS) for immunostaining and subse- 
quent flow cytometry analysis. Excluding the haematopoietic progenitor cell 
staining, all cell suspensions were incubated with Fc Block (BD, 553142) before 
staining with fluorophore-conjugated monoclonal antibodies. All fluorophore- 
conjugated antibodies used are listed as follows (clone, company, catalogue number): 
Gr-1 (RB6-8CA, eBioscience, 45-5931-80), c-Kit (2B8, BD Biosciences, 553356), 
CD11b (M1/70, eBioscience, 25-0112-82), CD34 (RAM34, eBioscience, 11-0341- 
85), CD16/CD32 (FcyRIII/ID) (93, eBioscience, 25-0161-81), CD19 (1D3, BD 
Biosciences, 563148), CD3 (17A2, BD Biosciences, 564009), B220 (RA3-6B2, 
BD Biosciences, 563708), TER-119 (TER-119, BD Biosciences, 563323), CD48 
(HM48-1, BD Biosciences, 561242), CD135 (A2F10.1, BD Biosciences, 562898), 
Ly-6C (AL-21, BD Biosciences, 560595), Sca-1 (D7, Biolegend, 108108), CD150 
(TC15-12F12.2, Biolegend, 115922), CD126 (D7715A7, BD Biosciences, 740038) 
and Rat IgG2b (R35-38, BD Biosciences, 562603). Aqua LIVE/DEAD Fixable Aqua 
Dead Cell Stain Kit was purchased from Life Technologies (L34966). CD11b*Gr1* 
myeloid cells were gated on live CD45* cells and lymphocytes are defined as live 
CD45+ gated CD11b™ cells. For phospho-flow, cells were stimulated with 0, 1 or 
10ng ul! murine IL-6 (Biolegend, 575704) for 30 min at 37°C, 5% COn, fixed 
with Fix Buffer I (BD Biosciences, 557870) for 10 min at 37 °C, stained with Sca-1 
(Biolegend, 108126) for 20 min at 4°C and permeabilized with Perm Buffer III 
(BD Biosciences, 558050) and stained for phospho-Stat3 (pY705, BD Biosciences, 
612569), c-Kit, CD34, CD11b, Grl, CD19 and B220 for 30 min at room tempera- 
ture. Flow cytometry analysis was performed with a nine-colour BD FACSCanto 
(BD Biosciences) and Aria Fusion (BD Biosciences; for cell sorting) using FlowJo 
software (Treestar). 

RNA processing and RT-PCR. Small and large intestines were removed and trans- 
ferred into cold PBS. A piece (~5 mm) of whole intestinal tissue (duodenum, jeju- 
num, ileum and colon) was soaked in RNAlater (Qiagen, 76106) at 4°C for 48h and 
then stored at —80°C until further analysis. For RNA extraction a Tissue-Tearor 
Homogenizer (Biospec) was used. RNA was prepared using the RNeasy Mini Kit 
(Qiagen, 74136). cDNA synthesis was performed using GoScript (Promega, A5004) 
according to the manufacturer's instructions. Expression analysis was performed in 
duplicate via RT-PCR on a Roche LightCycler 480 using SYBR Green (Clontech, 
639265). Expression levels were quantified and normalized to Gapdh expression 
using the following primer pairs (all mouse): 

Gapdh forward: 5‘-AGGTCGGTGTGAACGGATTTG-3’; Gapdh reverse: 
5!-TGTAGACCATGTAGTTGAGGTCA-3’; Dsp forward: 5'-TAC ACCTCAGGG C 
TGGAAAC-3/; Dsp reverse: 5'-GGGCCAGTCTTAGCTCCTCT-3’; 
Retnlb forward: 5‘-ATGAAGCCTACACTGTGTTTCC-3’; Retnlb reverse: 
5'-CTGCCAGAAGACGTGACACT-3’; Tjp1 forward: 5’-ACTCCCAC 
TTCCCCAAAAAC-3’; Tjp1 reverse: 5’-CCACAGCTGAAG GACTCACA -3’; 
Ang4 forward: 5’-GGTTGTGATTCCTCCAACTCTG-3’; Ang4 reverse: 
5’-CTGAAGTTTTCTCCATAAGGGCT-3’; Ocin forward: 5‘-ACTGG 
GTCAGGGAATATCCA-3’; Ocin reverse: 5‘-TCAGCAGCAGCCATG 
TACTC-3’; 16 forward: 5’-CCAAGAGGTGAGTGCTTCCC-3’; II6 reverse: 
5’-CTGTTGTTCAGACTCTCTCCCT-3’ 

RNA sequencing processing and data analysis. RNA sequencing libraries were 
prepared using the Illumina TruSeq protocol and sequenced with single-end 100- 
bp reads on an Illumina HiSeq2500. Adaptor sequences and low quality score 
bases were first trimmed using Trim Galore. The resulting reads were aligned to 
the mouse genome reference sequence (GRCm38/mm10) using the TopHat2 soft- 
ware package*? with a TopHat transcript index from Ensembl. The number of read 
fragments overlapping with annotated exons of genes was tabulated using HTSeq** 
using the following parameters: -q -m intersection-nonempty -s no. Non-coding 
or low-expressed genes with an average read count lower than 10 were discarded, 
resulting in 15,045 genes in total. Using normalized gene counts for three Tet2+/+ 
and three Tet2~/~ samples for each of the four tissue compartments, we identified 
differentially expressed genes using the R package limma*’. Our linear model had 
the following design: Expression ~ Genotype:Tissue. Once we identified jejunum 
as the tissue with the largest differences in gene expression between Tet2*/* and 
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Tet2-'~ samples, we increased our sample size for this tissue for a total of seven 
Tet2*'* and seven Tet2~/~ jejunum samples. We corrected for batch effects between 
the first and second set of samples using ComBAT*®. We considered a gene as 
differentially expressed if statistically supported at a Q-value?’ false discovery rate 
<0.1 and a |log,(fold change)| > 0.5. 

Bacterial cultures and species identification. Organs were aseptically removed, 
weighed and transferred immediately to an anaerobic chamber (Coy Systems) 
for manual homogenization in 300 il of pre-reduced (autoclaved, sterile filtered 
(0.22 1m), oxygen reduced with vacuum degasification) PBS + 0.1% L-cysteine 
(Sigma-Aldrich, 168149). Homogenates (10011 per plate) were plated onto BD 
BBL Brucella agar with 5% sheep blood, Hemin and vitamin K1 agar plates 
(Fisher Scientific, L97848) and incubated anaerobically. For aerobic cultures tis- 
sue homogenates were plated onto Teknova brain heart infusion (BHI) agar plates 
(Fisher Scientific, 50-841-098) and incubated aerobically. Plates were sealed with 
parafilm and incubated upside down and after 48h (aerobic) or 120h (anaerobic) 
the colony-forming units (CFUs) were quantified. Results are displayed as CFUs 
per gram (g) tissue. 

Representative bacterial colonies grown on agar plates were picked with sterile 
pipette tips and stored at —80°C until analysis. At day of analysis picked bacterial 
colonies were thawed at room temperature, resuspended with 6 1] of sterile water 
and bacteria were lysed by heating up the samples to 95°C for 10 min. Samples were 
subsequently cooled down to 4°C and then the DNA (2,11) was used as template 
DNA in PCR reactions amplifying the 16S rRNA gene using universal bacterial 
16S rRNA primers (27F, 5‘-AGAGTTTGATCMTGGCTCAG-3’ and 1525R, 
5'-AAGGAGGTGATCCAGCC-3’) with reaction conditions: 95°C for 5 min fol- 
lowed by 35 cycles of 95°C for 30s, 55°C for 30s, 72°C for 2 min and then 72°C 
for 20 min. The amplification product (1011) was incubated with 2 l ExoSAP-IT 
(Thermo Fisher, 78200.200.UL) for 37°C for 15 min, followed by 80°C for 15 min. 
Amplicons were sequenced by capillary sequencing, and the resulting sequences 
were analysed using BLASTN and the 16S ribosomal RNA sequences database for 
species identification®®. 

Bacterial lysates. Individual bacterial strains were grown up anaerobi- 
cally in lactobacillus MRS broth (Fisher Scientific, DF0881-17-5) for 48h, 
and sonicated for 3 x 20s in cold PBS. The homogenate was centrifuged 
at 10,000g for 20 min to remove debris and filtered through a 0.2-1M fil- 
ter. 16S copies of the homogenate were quantified by qPCR using the uni- 
versal primers (340F, 5’-ACTCCTACGGGAGGCAGCAGT-3’) and (514R, 
5!-ATTACCGCGGCTGCTGGC-3’) and purified genomic DNA from Blautia 
producta (Prevot) (ATCC, 27340D-5) as a standard”. 

HEK-TLR2 Quanti-Blue secreted embryonic alkaline phosphatase reporter 
assay. HEK-Blue TLR2 cells (provided by the Gajewski laboratory) were plated 
in HEK-Blue detection medium (InvivoGen, hb-det2) containing a substrate for 
alkaline phosphatase at 5 x 10* cells per well in 96-well plates and incubated with 
different concentrations of Pam3CSK4 (InvivoGen, tlrl-pms), LPS (InvivoGen, tlrl- 
peklps) or bacterial lysates normalized to 16S rRNA copies. Alkaline phosphatase 
activity was measured after 5h by reading optical density at 620 nm. 

DNA extraction from tissue or faeces for 16S qPCR. Total DNA was extracted 
from organs (collected under sterile conditions, ~30 mg spleen, ~30 mg MLN and 
5011 peripheral blood) and faeces using the DNeasy Blood & Tissue Kit (Qiagen, 
69504) and the Fast DNA Stool Mini Kit (Qiagen, 51604), respectively. Quantitative 
PCR (qPCR) of 16S rRNA-encoding genes was performed as previously 
described*”. In brief, qPCR was performed on a Roche LightCycler 480 (Roche 
Scientific) using the primers (340F, 5’-ACTCCTACGGGAGGCAGCAGT-3’) 
and (514R, 5’/-ATTACCGCGGCTGCTGGC-3’). Reactions were run at 95°C 
for 3 min, followed by 40 cycles of 95°C for 15 min and 63°C for 60s. Specific 
amplification of targets was quantified using dilution curves of a purified pCR4- 
TOPO vector (Invitrogen) containing a cloned 16S rRNA-encoding gene from 
Blautia producta (Prevot) (ATCC, 27340D-5) as a standard’? for both blood and 
faeces. Standards ranging in concentration from 108-10° plasmid copies per j1l 
were run in parallel with our samples for both blood and faeces during each 
qPCR run. Using those results, a standard curve was generated to quantify the 
number of copy numbers within the samples. To determine the bacterial load 
in the faeces samples, the results were normalized to faecal weight. Bacterial 
16S rRNA gene qPCRs from MLN and spleen were normalized to the host 
murine Ifnb1 gene (forward: 5’/-CCATCCAAGAGATGCTCCAG-3’; reverse 5’- 
GTGGAGAGCAGTTGAGGACA-3’). C57BL/6 wild-type germ-free mice served 
as negative control (Fig. 1b). 

16S rRNA amplicon library preparation, sequencing and data analysis. Faeces, 
jejunal and colon contents, and mucosal scrapings from the jejunum were collected 
and snap-frozen in liquid nitrogen. Bacterial DNA was extracted using the Fast 
DNA Stool Mini Kit (Qiagen, 51604). 16S rRNA amplicon library preparation and 
sequencing was performed as previously described*”. Raw sequencing data were 
de-multiplexed, and partially overlapping paired-end reads were merged using 
illumina-utils*®. Mismatches at the overlapping regions of pairs were resolved 
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using the base with the higher Q-score, and the merged sequences were kept for 
downstream analyses only (1) if they contained less than three mismatches at the 
overlapping region, and (2) if 66% of the bases in the first half of each read had 
an average Q-score of 30. The quality-filtered reads were partitioned into ecologi- 
cally relevant units using minimum entropy decomposition (MED)! with default 
parameters. Using Shannon entropy, MED resolves a given amplicon dataset iter- 
atively into oligotypes, which are able to describe differences between microbial 
groups at a single-nucleotide resolution”. The taxonomy of oligotypes was inferred 
through GAST”. The metaMDS function in the vegan‘? library was used to gen- 
erate ordinations of samples based on their oligotype profiles using non-metric 
multidimensional scaling with Bray—Curtis distances, and the envfit function in 
vegan with 999 permutations was used to test whether the genotype or the source 
of samples were significantly associated with differences between microbiomes. 
The pheatmap library for R was used to generate heat-map displays of the sample 
oligotype profiles, and the hierarchical clustering of samples in heat maps employed 
Bray—Curtis distances, and the average linkage algorithm. 

Analysis of cytokine production in blood plasma and intestinal explants. Blood 
was collected by cheek bleeding (after sterilizing cheeks with 70% ethanol wipes) 
in EDTA (Thermo Fisher Scientific, MT-46034CI)-coated tubes, centrifuged at 
2,000g for 10 min at 4°C and plasma was stored at —80°C until cytokine analysis. 
One-centimetre pieces from the jejunum and colon were collected, opened longi- 
tudinally, washed in PBS and explants were cultured in 500 jl RPMI 1640 medium 
(Fisher Scientific, MT-10-040-CV) containing 10% FBS, 200 U ml“! penicillin, 
and 200g ml“! streptomycin (MP Biomedicals, 091670249) for 48h. The IL-6 
cytokine amount in blood plasma and in supernatants of intestinal explants was 
determined with BioPlex multianalyte technology (Biorad) according to manu- 
facturer’s instructions. 

In vitro haematopoietic methylcellulose colony-forming assay. Similar to a 
related study”, lineage depleted (mouse Lineage Cell Depletion Kit (Miltenyi Biotec, 
130-090-858) haematopoietic progenitors of the bone marrow and the spleen were 
seeded in duplicate in cytokine-supplemented methylcellulose (MethoCult, M3434, 
STEMCELL Technologies) at a density of 5,000 cells per plate (bone marrow) or 
25,000 cells per plate (spleen). Colonies were counted every 7 days and 10,000 cells 
per plate were seeded for replatings. The average number of colonies per 10,000 
plated cells is shown for 5 sequential platings. For anti-mouse IL-6 antibody (MP5- 
20F3, Bio X Cell, BE0046) and isotype (HRPN, Bio X Cell, BE0088) treatment, 
anti-IL-6 antibody or isotype was freshly added to the cytokine-supplemented 
methylcellulose medium before every replating at a concentration of 100,1g ml}. 
Haematoxylin and eosin staining. Jejunum was flushed with cold PBS (GE 
Healthcare Life Science, SH30028.02) followed by immediate flushing with 10% 
formalin (Fisher Scientific, L14416). A section of ~3 cm from jejunum was cut 
with a razor blade, cut open longitudinally and pinned out with needles in wax 
boxes. Tissue was fixed for 24h at room temperature and pre-embedded in 2% agar 
(Sigma, A7921). Haematoxylin and eosin staining was performed on 5-j1m paraffin- 
embedded intestinal sections. The haematoxylin and eosin stained slides were 
digitized (objective: x20; camera: CIS 3CCD 2 Mega Pixel) with the Pannoramic 
MIDI Scan System (3DHISTECH). The Panoramic viewer software (3 DHISTECH) 
was used for imaging. 

Immunofluorescence staining. For ZO-1 staining, jejunum and proximal colon 
was dissected and cross-sections were frozen in optimum cutting temperature 
compound (Tissue-Tek). Frozen sections of 5-j1m thicknesses were cut and fixed 
in 1% paraformaldehyde in 1x PBS for 30 min at room temperature. After wash- 
ing in PBS, three 5-min incubations in PBS with 50 mM NH4Cl were performed, 
followed by permeabilization in PBS with 0.5% NP-40 for 20 min, and three addi- 
tional 5-min washes in PBS. Tissue sections were then incubated with mouse IgG 
ZO-1 monoclonal antibody (ZO1-1A12, Thermo Fisher Scientific, 33-9100) in 
PBS with 10% normal donkey serum overnight at 4°C, washed five times with 
PBS with 1% normal donkey serum (5 min each) and incubated with Alexa Fluor 
594-conjugated donkey anti-mouse (IgG, Jackson ImmunoResearch, 715-586-151) 
and Hoechst 33342 (ThermoFisher Scientific, H3570) for 1 h at room temperature. 
After incubation, tissue was washed five times with PBS with 1% normal donkey 
serum (5 min each) and mounted in Pro-Long Gold slides followed by imaging 
using an Olympus [X81 inverted microscope with a Photometrics CoolSNAP HQ2 
camera. For cleaved caspase 3 and TUNEL staining, jejunum was dissected and 
cross-sections were fixed in 10% formalin, and then processed for paraffin section- 
ing. Five-micrometre sections were dewaxed by immersion in xylene and hydrated 
by serial immersion in ethanol and PBS. Antigen retrieval was performed by incu- 
bating sections in a pressure cooker (Cuisinart) for 15 min in Target Retrieval 
Solution (DAKO, $1699). Sections were washed with PBS (twice for 10 min), 
and blocking buffer (TBS containing 10% BSA (EMD Millipore Sigma, 82-045- 
1) and OmniPur Triton X-100 (0.3%, EMD Millipore Sigma, 9400) was added 
for 1h. Sections were incubated with anti-pan-keratin (dilution 1:200, ABCAM, 
ab6401) and anti-cleaved caspase 3 (dilution 1:500, Cell Signaling Technology, 
9661S) antibody in blocking buffer overnight at 4°C and then incubated with goat 
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anti-mouse IgG1 Alexa Fluor 488 (Thermo Fisher Scientific, A21121) or goat 
anti-rabbit Alexa Fluor 594 (Thermo Fisher Scientific, A11037) labelled second- 
ary antibodies (dilution 1:400, Thermo Fisher Scientific) for 30 min. Sections were 
mounted with Fluoromount-G (Beckman Coulter, 731604). DNA fragmentation 
in gut sections was assayed by TUNEL assay in formalin-fixed sections accord- 
ing to the manufacturer's instructions (Roche Applied Science, 12156792910). 
Sections were stained with DAPI (101g ml~ 7 Invitrogen, D1306) and mounted 
with Fluoromount-G (Beckman Coulter, 731604). As a positive control, jejunum 
sections of VDTR mice (mice expressing diphtheria toxin receptor under the villin 
promoter (epithelial specific)) treated with 10 ng of diphtheria toxin per g of body 
weight, for 12h, were used**. 

Statistical analysis. The majority of experiments were repeated at least three times 
to obtain data for indicated statistical analyses. Mice were allocated to experimental 
groups on the basis of their genotype and randomized within the given sex- and 
age-matched group. Because our mice were inbred and matched for age and sex, 
we always assumed similar variance between the different experimental groups. 
We did not perform an a priori sample size estimation but always used as many 
mice per group as possible in an attempt to minimize type I and type IJ errors. 
Investigators were not blinded during experiments and outcome assessment, except 
for microscopic analysis of fluorescent immunostaining, which was performed 
blinded. For bacterial culture experiments (aerobic and anaerobic) (Fig. 1d) a 
representative image out of five biological replicate experiments is shown. For 
microscopy analyses (Extended Data Figs. 2a-c, 4d, 10g) representative images out 
of at least three biological replicate experiments are shown. All experimental and 
control animals were littermates and none were excluded from the analysis at the 
time of collection. The number of mice per group is described in the corresponding 
figure legends as n and all quantitative data are presented as mean + error of the 
mean (s.e.m.), unless otherwise indicated. Data were first analysed for normal 
distribution using D'Agostino and Pearson omnibus normality tests. Normally 
distributed data were analysed using a paired or unpaired two-tailed Student's t-test 
for single comparisons, and one-way or two-way ANOVA for multiple compari- 
sons. ANOVA analysis was followed by a Sidak’s post hoc test. Data that were not 
normally distributed were analysed using unpaired two-tailed Mann-Whitney 
U-test for single comparisons, and Kruskal-Wallis, Dunn’s post hoc test for mul- 
tiple comparisons. Correlations were calculated using the Pearson and Spearman 
correlation. Figures and statistical analysis were generated using GraphPad Prism 
6 (GraphPad Software). The statistical test used and P values are indicated in each 
figure legend. P values of <0.05 were considered to be statistically significant. 
*P<0.05, **P<0.01, ***P< 0.001 and ****P<0,0001. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 


Data availability. Datasets that support the findings of this study have been 
deposited online in the Gene Expression Omnibus (GEO) under accession 
number GSE99333. Source Data for all mouse experiments have been provided. 
All other data are available from the corresponding author upon reasonable 
request. 
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Extended Data Fig. 1 | See next page for caption. 
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Extended Data Fig. 1 | Tet2—/~ mice display variability in PMP 
penetrance that correlates with bacteraemia. a, Penetrance of PMP 
assessed by splenomegaly, splenic LSK cell expansion and CD11b*Gr1* 
myeloid cell expansion in spleen and peripheral blood. b-e, Correlation 
between frequency of peripheral-blood CD11b*Gr1* myeloid cells 
(peripheral-blood biomarker) and numbers of CD11b*Gr1* myeloid 
cells in spleen (b) and peripheral blood (c), numbers of LSK cells (d) and 
splenomegaly (e), in Tet2+/* and Tet2~'~ mice. In b-e, dotted vertical 
lines indicate level of 16% for peripheral-blood CD11b*Gr1* myeloid 
cells (level determined by the distribution in wild-type littermates). 

The peripheral-blood biomarker (frequency of CD11b*Gr1* myeloid 
cells) was used to categorize Tet2~'~ mice either as symptom free (<16% 
CD11btGr1* myeloid cells) or as mice with PMP (>16% CD11b*Gr1* 
myeloid cells). Spearman correlation test excluding Tet2*!* mice. 

Red lines indicate regression line calculated for Tet2~/~ data points (n= 10 
(blue), 23 mice (red). f-i, Symptom-free, Tet2~'~ mice with PMP and 
littermate controls were used. f, Representative dot blots are shown of 
LSK cells in the spleen, CD11b*Gr1* myeloid cells in the spleen and 
peripheral blood, and lymphocytes in peripheral blood. Data are 
representative of five independent experiments with similar results. 


g, Spleen weight (top; n= 11 (blue), 9 (red, PMP) or 5 mice (red, symptom 
free)) and numbers of CD11b*Gr1* myeloid cells in the spleen (bottom; 
n=10 (blue), 12 (red, PMP) or 7 mice (red, symptom free)). h, Numbers 
per ml of peripheral blood of live CD45* gated white blood cells (WBC) 
(top left), lymphocytes (top right), CD11b* monocytes (bottom left) and 
CD11btGr1* myeloid cells (bottom right) (1 = 15 (blue), 15 (red, PMP) or 
32 mice (red, symptom free)). i, In vitro HSC self-renewal colony-forming 
assay of haematopoietic progenitors of the bone marrow (n =3 mice). 
Mean +s.e.m. j, k, Correlation between 16S gene copies in the peripheral 
blood and spleen weight (1 = 16 mice) (j) or numbers of CD11bT 
monocytes (k top left), percentage of lymphocytes (k top right), numbers 
of lymphocytes (k bottom left), and numbers of leukocytes (WBC) 

(k bottom right) in the peripheral blood (n= 40 mice). Pearson correlation 
test. 1, Bacterial colonies from Fig. le, f identified by 16S sequencing; 

blue rectangles indicate presence of bacteria. In g, h, boxes represent 
median values and interquartile ranges; whiskers represent minimum 

and maximum values. One-way ANOVA, Sidak’s post hoc test. Data 

are representative of at least three independent experiments. *P < 0.05, 
** P< 0.01, ***P < 0.001, ****P < 0.0001. 
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Extended Data Fig. 2 | Tet2—'~ mice display no anatomical changes in 
the jejunum. a, Representative images of haematoxylin and eosin-stained 
sections of the jejunum. b, c, Immunofluorescence analyses of sections of 
the jejunum of Tet2*/+ and Tet2~'~ mice. b, No difference was observed 
between the groups in the number of apoptotic cells using the TUNEL 
assay (red). Nuclei of intestinal cells were visualized with DAPI (blue). 
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c, No difference was observed in the number of apoptotic (cleaved caspase 
3-positive (CC3), red) epithelial cells (pan-keratin-positive, (PanK), green) 
between the groups. In b, c, Jejunum sections of diphtheria-toxin-treated 
VDTR mice were used as a positive control. Data are representative of at 
least two independent experiments. 
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Extended Data Fig. 3 | Disruption of barrier integrity drives PMP. 

a, FITC-dextran concentration in blood plasma correlates with numbers 
of LSK cells (left, n = 12 mice), frequency of CD11b*Gr1* myeloid cells 
in the spleen (middle left, nm = 10 mice), peripheral blood (middle right, 
n= 15 mice) and frequency of lymphocytes (right, n = 15 mice). Pearson 
correlation test. b, Schematic of DSS treatment of symptom-free Tet2! Vay? 
mice that are over 20 weeks old, and littermate controls. c, Numbers of 
CD11b* monocytes (left), lymphocytes (middle) and leukocytes (WBC) 
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(right) before, during and after DSS treatment (n =5 (blue) or 8 (orange) 
mice). Mean +s.e.m. d, Percentage of CD11b*Gr1* myeloid cells (left) 
and numbers of GMP (right) at end point analysis (n =5 (Tet2"Icre-, 
without DSS), 4 (Tet2“/Vav, without DSS), 6 (Tet2“cre~, with DSS) or 8 
(Tet2!’Vav, with DSS) mice). Centre is median, Kruskal-Wallis, Dunn’s 
post hoc test. *P < 0.05, **P< 0.01. Data are representative of at least two 
independent experiments. 
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Extended Data Fig. 4 | Tight junction ZO-1 is markedly reduced in the peptides and tight junction genes. c, Gene expression of tight junction 


jejunum but not in the colon of Tet2~'~ mice. a, RNA sequencing of gene (ZO-1, Tjp1) in the jejunum (left; n= 12 (Tet2*/*) or 9 (Tet2-') 
whole intestinal tissue of duodenum, jejunum, ileum and colon of three mice) and colon (right; n =9 mice in both cases). Centre is mean, two- 
Tet2~'~ and three Tet2*/+ mice. Venn diagram illustrating the number tailed unpaired t-test. d, ZO-1 immunofluorescence shows reduced tight 
of differentially expressed genes (|log2(fold change)| > 0.5 and false junction staining of epithelial cells in the jejunum (left) but no difference 


discovery rate < 0.1); see Supplementary Table in Tet2~/~ mice compared in colonic epithelial cells (right) of Tet2~/~ mice. Representative images 


to littermate controls. b, RNA sequencing of the jejunum of seven Tet2~'~ are shown from Tet2t’+ (n=3 mice) and Tet2~'~ mice (n =7 mice). Scale 
and seven Tet2+/* mice (the full list of differentially expressed genes is bars, 100,1m. Green, ZO-1; blue, DAPI. **P < 0.01. Data are representative 
shown in Supplementary Table. Heat map of a selection of antimicrobial of at least two independent experiments. 
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Extended Data Fig. 5 | Absence of Tet2 expression in haematopoietic 
cells leads to barrier dysfunction and PMP. a, Intestinal permeability 
measurement by blood plasma FITC-dextran concentrations (n= 16 
(Tet2"icre-), 6 (Tet2’“Vav), 12 (Tet2Villin™*) or 12 (Tet2/LysM) 
mice). b, c, Gene expression of antimicrobial peptides Retlnb (b left), Ang4 
(b right) and (c) tight junction genes ZO-1 (Tjp1) (left), occludin (Ocln) 
(middle), and desmoplakin (Dsp) (right) in the jejunum (n= 13 (Tet2"Icre-), 
7 (Tet2/’FVvav"), 9 (Tet2/fVillin*) or 10 (Tet2“/LysM“*) mice). 

d, Representative dot blots (left) and numbers of LSK cells (right) in the 
spleen, CD11b*Gr1* myeloid cells in the spleen (n =6 (Tet2“cre~), 


12 (Tet2"Vav), 10 (Tet2Villin) or 10 (Tet2"LysM) mice) and 
peripheral blood, and lymphocytes in the peripheral blood (n= 10 
(Tet2“cre-), 8 (Tet2/Vav"), 10 (Tet2Villin"*) or 10 (Tet2/fLysM"") 
mice). e, White blood cell (WBC) count (top) and cell number per ml 

of blood of CD11b* monocytes (bottom) (n= 10 (Tet2cre-), 8 
(Tet2!Fvav), 10 (Tet2Villin””) or 10(Tet2”LysM“*) mice). f, Spleen 
weight is shown (n= 10 (Tet2"cre-), 6 (Tet2Vav), 10 (Tet2Villin™) or 
10 (Tet2“LysM“) mice). In a-f, centre is mean, one-way ANOVA, Sidak’s 
post hoc test. *P < 0.05, **P< 0.01, ***P < 0.001, ****P < 0.0001. Data 
are representative of three independent experiments. 
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Extended Data Fig. 6 | Microbial signals are sufficient for PMP in 
Tet2—'~ mice. a, TLR2 activation measured using a HEK-TLR2 reporter 
assay (n =5 biological replicates). b, Schematic of Pam3CSK¢4 in vivo 
treatment of symptom-free Tet2!Vav'r® mice that are over 20 weeks old, 
and littermate controls. c, Numbers of CD11b* monocytes prior (day 0), 
during treatment (day 2) and at end point analysis (day 14) (n =6 mice). 
Mean +s.e.m. d, Percentage of CD11b+Gr1* myeloid cells (left, n=5 
(Tet2“cre~, no Pam3CSK4), 6 (Tet2“cre~, with Pam3CSK4), 

7 (Tet2"Vav, no Pam3CSKA4) or 6 (Tet2“Vav, with Pam3CSK4) mice) 
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and numbers of GMP cells (right, n = 6 (Tet2/fcre~, no Pam3CSK4), 6 
(Tet2"cre~, with Pam3CSK4), 7 (Tet2“Vav"", no Pam3CSK4) or 6 
(Tet2“Vav’, with Pam3CSK4) mice). e, Intestinal permeability 
measurement by blood plasma FITC-dextran concentrations at end 
point analysis (n = 6 mice in all cases). a, ce, Centre is mean. a, d, One- 
way ANOVA, Sidak’s post hoc test, *P < 0.05, **P< 0.01, ***P< 0.001, 
*** P < (0001. Data are representative of at least two independent 
experiments. 
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Extended Data Fig. 7 | See next page for caption. 
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Extended Data Fig. 7 | Microbial community structures are similar 
between Tet2—'~ mice and littermate controls. a, b, Non-metric 
multidimensional scaling of samples based on their oligotype profiles 
(with Bray-Curtis distance). a, Microbial community structures within 
the same body site—that is, jejunum (left, n = 14 (Tet2~'~ (KO)) or 9 mice 
(Tet2*+'+(WT))), colon (middle, n =7 (KO) or 5 mice (WT)) and faeces 
(right, n = 6 (KO) or 5 mice (WT))—do not differ significantly across 
genotypes based on envfit test with 999 permutations (P > 0.05, two- 
tailed). b, Microbial community structures within the same genotype— 
WT (top, n=5 (colonic sample (COL) or faecal sample (FEC)) or 9 
mice (jejunal sample (JEJ)) and KO (bottom, n = 6 (FEC), 7 (COL) or 
14 mice (JEJ))—significantly differ across body sites based on the same 


LETTER 


test (P< 0.001). c, d, Heat-map displays of oligotypes across samples. 
Clustering dendrograms are computed with Bray—Curtis distance and 
average linkage algorithm using the oligotype profiles. The intensity of 
colours indicates the per cent abundance of a given oligotype (rows) in a 
given sample (columns). c, Comparison of genotypes for samples collected 
from a single body site. d, Comparison of genotypes for samples collected 
across body sites. e, Co-housing does not affect the development of PMP 
in Tet2Vav'" mice. Representative cage configurations of 3 litters from 
Tet2!!Vav" mice with PMP that are over 20 weeks old (red), symptom- 
free Tet2!/Vav mice (grey), and littermate controls (blue). CON, luminal 
content; SCR, scraping samples. 
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Extended Data Fig. 8 | Microbial signals are required for PMP in 
Tet2—'— mice. a, b, Germ-free, SPF-housed Tet2~/~ mice that are over 40 
weeks old, and germ-free wild-type controls, analysed for percentage of 
lymphocytes (a left), numbers of lymphocytes (a middle left), numbers of 
CD11b* monocytes (a middle right) and numbers of leukocytes (WBC) 
(aright) (n= 6 (Tet2*!*, GF), 7 (Tet2-'~, GF) or 7 (Tet2~/~, SPF) mice), 
and percentage of CD11b*Gr1* myeloid cells (b left) and numbers of 
GMP cells (b right), (1 =7 (Tet2*/+, GF), 4 (Tet2~/~, GF) or 5 (Tet2-/-, 
SPF) mice). c, Mice treated with antibiotics (ABX) before onset of PMP 
(see schematic in Fig. 3c. Numbers of CD11b*Gr1* myeloid cells (left; 


n= 14 (Tet2*'+, no ABX), 11 (Tet2~'~, no ABX), 7 (Tet2+/+, with ABX) 
or 7 (Tet2~'~, with ABX) mice) and LSK cells (right; n = 12 (Tet2*/*, no 
ABX), 8 (Tet2~/~, no ABX), 7 (Tet2*!*, with ABX) or 7 (Tet2~/~, with 
ABX) mice). d, Tet2~/~ mice monitored for the number of CD11bt 
monocytes (left, n =7 mice) and 16S gene copies in the faeces (right, 

n=6 mice) before, during and after antibiotics treatment. Mean +s.e.m., 
repeated measures one-way ANOVA, Sidak’s post hoc test. In a-c, centre 
is mean, one-way ANOVA, Sidak’s post hoc test, *P < 0.05, **P < 0.01, 
P< 0.001, ****P < 0.0001. Data are representative of at least two 
independent experiments. 
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Extended Data Fig. 9 | See next page for caption. 
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Extended Data Fig. 9 | Antibiotic treatment reverses PMP in Tet2~/~ 
mice. a-j, Tet2~'~ mice with PMP that are over 20 weeks old, and 
littermate controls, were treated with and without antibiotics (ABX) for 
four weeks. a, Numbers of peripheral-blood CD11b* monocytes (n= 13 
mice). Lines connect values obtained from the same mouse sampled 
before (—) and after (+) ABX treatment. b, c, Representative dot blots 
and percentage of CD11b*Gr1* myeloid cells (b) (n= 20 (Tet2*'*, no 
ABX), 21 (Tet2~/~, no ABX), 6 (Tet2*'+, with ABX) or 14 (Tet2~/-, with 
ABX) mice) and LSK cells in the spleen (c) (n= 11(Tet2*/*, no ABX), 

12 (Tet2~'~, no ABX), 6 (Tet2+/*, with ABX) or 12 (Tet2~'~, with ABX) 
mice). d, Representative dot blots and numbers of bone marrow-derived 
LSK cells (n= 11(Tet2*'t+, no ABX), 13 (Tet2~'~, no ABX), 6 (Tet2*!*, with 
ABX) or 12 (Tet2~'~, with ABX) mice). e, f, Representative dot blots and 
numbers of splenic (e) (n = 6 (Tet2*/*, no ABX), 12 (Tet2~'~, no ABX), 5 
(Tet2+!*, with ABX) or 12 (Tet2~'~, with ABX) mice) and bone marrow- 
derived (f) LSK gated CD34 Flt3~ (FIt3 is also known as CD135) short- 


term (ST)-HSCs and CD34~Flt3~ long-term (LT)-HSCs (n =6 (Tet2*/*, 
no ABX), 13 (Tet2~/~, no ABX), 5 (Tet2*!*, with ABX) or 12 (Tet2~'~, 
with ABX) mice). g, h, Representative dot blots and numbers of splenic 
(g) (n=5(Tet2t!*, no ABX), 10 (Tet2~/~, no ABX), 5 (Tet2*/*, with ABX) 
or 10 (Tet2~/~, with ABX) mice) and bone marrow-derived (h) LSK gated 
CD150+ CD48°~ cells (n=5 (Tet2*!*, no ABX), 11 (Tet2~'~, no ABX), 5 
(Tet2*/*, with ABX) or 10 (Tet2~'~, with ABX) mice). i, j, Representative 
dot blots and numbers of splenic (i) (n =6 (Tet2*/+, no ABX), 12 (Tet2-'-, 
no ABX), 5 (Tet2*!*, with ABX) or 12 (Tet2~'~, with ABX) mice) and 
bone marrow-derived (j) c-Kit*Sca-1~ (LK gated) GMP, common myeloid 
progenitor (CMP) and megakaryocyte-erythroid progenitor (MEP) cells 
(n=6 (Tet2*/*, no ABX), 13 (Tet2~/~, no ABX), 5 (Tet2*!*, with ABX) or 
12 (Tet2~'~, with ABX) mice). In b-j, centre is mean, one-way ANOVA, 
Sidak’s post hoc test. Data are representative of at least three independent 
experiments. *P < 0.05, **P < 0.01, ***P< 0.001, ****P < 0.0001. 
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Extended Data Fig. 10 | See next page for caption. 


© 2018 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


Extended Data Fig. 10 | Bacteria-induced IL-6 is required for PMP 

in Tet2—'~ mice. a, Gene expression of II6 in the spleen of SPF-housed 
Tet2~'~ mice, littermates and germ-free Tet2~/~ mice (n=7 (Tet2*!*, 
SPE), 10 (Tet2~/~, SPF) or 4 (Tet2~'~, GE) mice). b, IL-6 cytokine levels 
in blood plasma of Tet2~/~ mice with PMP treated with antibiotics (ABX) 
for four weeks (n =5 mice). Lines connect values obtained from the same 
mouse sampled before and after antibiotics treatment. Two-tailed paired 
t-test. c, IL-6 cytokine levels in blood plasma of DSS-treated symptom-free 
Tet2!Vav’ mice that are over 20 weeks old, and littermate controls (see 
schematic in Extended Data Fig. 3b (n =6 (Tet2*!*) or 9 (Tet2~'~) mice). 
d, IL-6 cytokine levels in blood plasma of Tet2’!'Vav mice that are over 
20 weeks old, and littermate controls, treated with the TLR1/2 agonist 
Pam3CSK4 (see schematic in Extended Data Fig. 6b (n=6 (Tet2*/*) or 

8 (Tet2~'~) mice). e, Correlation between IL-6 cytokine levels in blood 
plasma of Tet2~'~ mice and numbers of peripheral-blood CD11bt 
monocytes (n= 49 mice). Pearson correlation test. f, g, In vitro HSC 
self-renewal colony-forming assay of haematopoietic progenitors of the 
spleen from Tet2~/~ mice (red lines) and littermate controls (blue lines) 
in the presence of anti-IL-6 antibody or isotype control (ISO) after the 
first replating (n =3 mice). g, Representative images of colonies after the 
5th replating. Scale bars, 100,1m. h, Representative histogram (left) and 
quantification of (right) mean fluorescence intensity (MFI) of IL-6Ra*c- 
Kit*Sca-1~ (LK gated) CD34*FcyRIII/II* GMPs from the bone marrow, 
(n=5 (Tet2+!*, SPE), 6 (Tet2~'~, SPE), 4 (Tet2*/*, GF) or 3 (Tet2~/~, GF) 
mice). i, Representative flow cytometry plot of Stat3 phosphorylation 
(pY705) response after 30-min stimulation with 10 ng ml“! IL-6 in splenic 
c-Kit*Sca-1~ (LK gated) CD34* and CD34” myeloid progenitors (MP). 
j, Numbers of CD11b* myeloid cells (n= 7 (Tet2*!*) or 6 (Tet2~'~) mice). 
k, l, CD11b*F4/80* macrophages and CD11btGr1* myeloid cells of the 
spleen, from Tet2~/~ mice and littermate controls, were FACS sorted. 
Gene expression of II6 in macrophages (k) and CD11b*Grl1* cells 

(1) (n=7 (Tet2*!*) or 6 (Tet2~'~) mice). Centre is median. m, 
Representative histogram (left) and quantification of (right) MFI of 
IL-6RatCD11b*Gr1* myeloid cells in the spleen (n =5 (Tet2*!*, SPF), 

6 (Tet2~'~, SPE), 4 (Tet2*!*, GF) or 4 (Tet2~'~, GF) mice). 

n, Representative histogram (left) and quantification of (right) MFI of 


IL-6Ra*c-Kit'Sca-1~ (LK gated) CD34*FcyRIII/II* GMPs from the 
spleen of Tet2/LysM mice and littermate controls (n =5 mice). Centre 
is mean. 0, p, anti-IL-6 antibody (+) or ISO treatment (—) of Tet2~!— 
mice with PMP that are over 20 weeks old, and littermate controls (see 
schematic in Fig. 4g. 0, Percentage of CD11b*Gr1* myeloid cells (left; 
n= 10 (Tet2*!*, with anti-IL-6), 6 (Tet2~/~, with anti-IL-6) or 7 (Tet2-/~, 
without anti-IL-6) mice) and numbers of GMPs (right; n = 11 (Tet2*/*, 
with anti-IL-6), 6 (Tet2~'~, with anti-IL-6) or 8 (Tet2~/~, without 
anti-IL-6) mice) in the spleen. p, Intestinal permeability was assessed by 
blood plasma FITC-dextran concentrations (n = 6 mice in all cases). q, 
IL-6 in supernatants from intestinal explants: jejunum (left; n = 6 (Tet2*/*) 
or 5 (Tet2~'~) mice) and colon (right; n = 6 mice in both cases). Centre is 
mean. a, h, m, p, Centre is mean, one-way ANOVA, Sidak’s post hoc test. 
c, d, j, Centre is median, two-tailed Mann-Whitney U-test. 0, Centre is 
median, Kruskal-Wallis, Dunn’s post hoc test. Data are representative of at 
least three independent experiments; *P < 0.05, **P< 0.01, ***P< 0.001. 
r, Model showing that extrinsic (microbial-induced inflammatory) and 
intrinsic (IL-6Ra expression) signals are required for PMP in Tet2~/~ 
mice. In detail, small-intestinal barrier dysfunction (reduced ZO-1 and 
upregulation of defence response genes), which occurs spontaneously or 
upon intestinal damage, leads to bacterial translocation and to high levels 
of IL-6. Bacterial translocation can be bypassed when Tet2-deficient mice 
receive systemic microbial signals. Microbial-induced IL-6 is sensed by 
Tet2~'~ myeloid progenitor (MP) cells that overexpress IL-6Ra and are 
highly sensitive to IL-6 (Stat3 (pY705)). Subsequently, MPs expand upon 
IL-6 signals and preferentially differentiate into mature myeloid cells 
with IL-6-producing capacities. This cycle results in the development of 
PMP. Treatment with antibiotics or neutralizing anti-IL-6 antibody can 
revert PMP, indicating that microbial inflammatory signals are required 
for PMP in the context of Tet2 deficiency. However, whether bacteria- 
induced inflammatory signals also create a permissive environment 

that induces the acquisition of cooperative oncogenic mutations that 

lead to the development of leukaemia remains to be determined. The 
mechanisms through which Tet2 deficiency in haematopoietic cells leads 
to a microbiota-dependent impairment of gut barrier function remains to 
be addressed. 
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KLHL22 activates amino-acid-dependent mTORC1 
signalling to promote tumorigenesis and ageing 


Jie Chen!?, Yuhui Ou!, Yanyan Yang!, Wen Li’, Ye Xu’, Yuntao Xie* & Ying Liu!* 


The mechanistic target of rapamycin complex 1 (mTORC1) is 
a master regulator of cell growth that responds to a diverse set 
of environmental cues, including amino acids’. Deregulation 
of mTORCI1 has been linked with metabolic diseases, cancer 
and ageing”~*. In response to amino acids, mMTORCI1 is recruited 
by the Rag GTPases to the lysosome, its site of activation™®. The 
GATORI complex, consisting of DEPDC5, NPRL3 and NPRL2, 
displays GAP activity to inactivate Rag GTPases under amino- 
acid-deficient conditions’. However, it is unclear how the inhibitory 
function of GATOR1 is released upon amino acid stimulation. 
Here we find that in response to amino acids, the CUL3-KLHL22 
E3 ubiquitin ligase promotes K48-linked polyubiquitination and 
degradation of DEPDC5, an essential subunit of GATOR1. KLHL22 
plays a conserved role to mediate the activation of mTORC1 and 
downstream events in mammals and nematodes. Depletion of 
MEL-26, the Caenorhabditis elegans orthologue of KLHL22, extends 
worm lifespan. Moreover, KLHL22 levels are elevated in tumours of 
breast cancer patients, whereas DEPDCS levels are correspondingly 
reduced. Depletion of KLHL22 in breast cancer cells suppresses 
tumour growth in nude mice. Therefore, pharmacological 
interventions targeting KLHL22 may have therapeutic potential 
for the treatment of breast cancer and age-related diseases. 

To understand how inhibition by GATORI is released during 
mTORC1 activation, we monitored the stability of GATOR1 subunits 
in response to amino acid availability. Notably, the protein levels of 
DEPDCS5, but not NPRL3 or NPRL2, were regulated in an amino- 
acid-sensitive manner (Fig. 1a and Extended Data Fig. 1a). By contrast, 
levels of DEPDCS transcripts remained unchanged in the presence 
or absence of amino acids (Extended Data Fig. 1b). The stability of 
DEPDCS protein was extremely low under the basal culturing con- 
dition (Extended Data Fig. 1c, d). Accumulation of DEPDC5 was 
detected only when we treated cells with MG132, the 26S proteasome 
inhibitor (Extended Data Fig. le), suggesting that DEPDC5 undergoes 
proteasome-mediated degradation in response to amino acids. 

Covalent conjugation of ubiquitin is a key step in proteasome- 
mediated degradation of target proteins. Indeed, the ubiquitination 
of DEPDC5 was observed in the presence of amino acids (Fig. 1b and 
Extended Data Fig. 1f). DEPDC5 could be labelled only with wild-type 
or K48 ubiquitin (ubiquitin mutant that contains only one lysine), but 
not with K63 ubiquitin (Fig. 1c), indicating that amino acids promote 
K48-linked ubiquitination of DEPDC5. 

We next sought to identify the E3 ubiquitin ligase that targets 
DEPDC5. Because mT'ORC1 signalling is often deregulated in human 
cancers®*?, and the GATOR complex has been reported to function on 
the lysosomal surface!®, we screened a panel of E3 ligases that have been 
reported to affect mTORC1 activity or tumorigenesis, together with E3 
ligases localized to the lysosomes. We found that ectopic expression of 
KLHL22, mutations of which have been linked with breast cancer!!, 
promoted the degradation of endogenous DEPDC5 (Extended Data 
Fig. 2a). KLHL22 is a BTB (Bric-4-brac-Tramtrack-Broad) adaptor 


protein, usually forming a functional cullin-RING E3 ubiquitin ligase 
complex with the scaffold protein CUL3 and the ring-finger protein 
RBX1!' Notably, CUL3 and RBX1 were identified by mass spectrom- 
etry during our search for DEPDC5-interacting proteins (Extended 
Data Fig. 2b). The substrate specificity of cullin-RING E3 ubiquitin 
ligase is determined through a BTB adaptor protein within the 
complex". Indeed, overexpression of CUL3-RBX1-KLHL22, but not 
CUL3-RBX1-KLHL19, promoted the ubiquitination and degradation 
of DEPDCS (Fig. 1d, e). Deletion of the 6-Kelch (6K) repeats, the sub- 
strate recognition motif of KLHL22"°, or knockdown of KLHL22 by 
small interfering RNA (siRNA), blocked DEPDC5 ubiquitination and 
degradation (Fig. 1d-g). Furthermore, recombinant CUL3-RBX1- 
KLHL22 proteins were sufficient to promote DEPDCS5 ubiquitination 
in vitro (Extended Data Fig. 2c, d). 

Several lysine residues were predicted by UbPred’®, or have previ- 
ously been identified by mass spectrometry as possible ubiquitination 
sites of DEPDC5"” (Extended Data Fig. 3a). K48-linked polyubiquitina- 
tion and degradation of DEPDC5, and subsequent mTORC1 activation 
(indicated by S6K1 phosphorylation), were blocked if all five lysine 
residues were simultaneously mutated to arginine (5KR) (Extended Data 
Fig. 3b, c). However, none of the single mutations blocked DEPDC5 
ubiquitination and degradation (Extended Data Fig. 3d, e). Therefore, 
CUL3-KLHL22 E3 ligase catalyses K48-linked ubiquitination 
on multiple sites of DEPDC5. 

We next investigated how KLHL22 regulates DEPDCS5 in response to 
amino acid availability. KLHL22 interacted specifically with DEPDC5 in 
an amino-acid-sensitive manner (Fig. 1h and Extended Data Fig. 4a-c). 
The direct interaction between DEPDC5 and KLHL22 was further 
demonstrated using recombinant proteins in an in vitro binding assay 
(Extended Data Fig. 4d). However, the DEPDC5(5KR) mutant was 
not able to interact with KLHL22 (Extended Data Fig. 4d), possibly 
owing to the aberrant folding of DEPDC5 caused by the mutations. 
Therefore, the lysine residues responsible for DEPDC5 ubiquitylation 
by KLHL22 warrant further investigation. Through truncation map- 
ping tests, the DEP domain was identified as the degron of DEPDC5 
(Extended Data Fig. 4e, f), which is responsible for KLHL22-mediated 
degradation (Extended Data Fig. 4g). Mutation of each serine, thre- 
onine, or tyrosine within the degron did not prevent its interaction 
with KLHL22 (Extended Data Fig. 4h), suggesting that it might not be 
a phospho-degron. 

To understand how amino acids modulate KLHL22 activity, we 
monitored the localization of KLHL22 in response to amino acids. 
Notably, amino acids mediated nuclear-cytosolic shuttling of KLHL22 
(Extended Data Fig. 5a, b). Several phosphorylation sites have been 
reported in KLHL22!! (Extended Data Fig. 5c). We mutated each site 
to alanine, and found that S18 was required for the nuclear accumu- 
lation of KLHL22 in amino-acid-deprived conditions (Extended Data 
Fig. 5d). Through mass spectrometry analysis, we found that KLHL22 
associated with 14-3-3 proteins during amino acid starvation (Extended 
Data Fig. 5e). The S18A mutation disrupted the interaction between 
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Fig. 1 | CUL3-KLHL22 mediates K48-linked polyubiquitination and 
degradation of DEPDC5. a, DEPDCS is regulated in an amino-acid- 
sensitive manner. AA, amino acid. b, Accumulation of ubiquitinated 
DEPDC5 upon MG132 treatment. The asterisk represents HA-DEPDCS5. 
IB, immunoblot; Ub(), polyubiquitinated. c, DEPDC5 is modified by 
K48-linked polyubiquitination. IP, immunoprecipitation; WCL, whole- 
cell lysate. d, e, CUL3-KLHL22 promotes the ubiquitination (d) and 


KLHL22 and 14-3-3, leading to the constitutive activation of MTORC1 
(Extended Data Fig. 5f, g). Therefore, during amino acid starvation, 
KLHL22 was trapped by 14-3-3 proteins in the nucleus. Upon amino 
acid stimulation, KLHL22 was released and translocated to the cytosol. 
Once in the cytosol, KLHL22 localized at least in part to the lysosome 
(Fig. 1i), where GATORI resides!°'8. Consistently, KLHL22 accumu- 
lated on purified lysosomes in the in vitro reconstitution system when 
amino acids were supplemented (Fig. 1j and Extended Data Fig. 5h, i). 

Recruitment of mTORC1 to the surface of lysosomes is a key step 
for amino-acid-induced mTORC1 activation’. In KLHL22 knockout 
cells (sgKLHL22), mTORC1 could not accumulate on lysosomes upon 
amino acid stimulation (Fig. 2a and Extended Data Fig. 6a, b). In addi- 
tion, in KLHL22 knockout cells, amino acid replenishment was not able 
to reduce the DEPDCS level or activate mTORCI (Fig. 2b). Conversely, 
ectopic expression of CUL3-KLHL22 induced S6K1 phosphorylation 
(Fig. 2c). Overexpression or depletion of KLHL22 did not perturb 
mTORC1 signalling in DEPDC5-deficient cells (Extended Data Fig. 6c, d), 
further suggesting that KLHL22 regulates the mTORC1 pathway 
through modulation of DEPDC5. 

Expression of dominant-negative Rag GTPases (Rag completely 
blocked KLHL22-mediated mTORCI activation (Fig. 2c), indicating 
that KLHL22 acts upstream of Rag GTPases to mediate mTORC1 
signalling. Consistently, KLHL22 displayed nuclear-cytosolic shuttling 
and activated mTORC1 in response to leucine but not glutamine, an 
amino acid that activates mTORC1 in a Rag-independent fashion”? 
(Fig. 2d, e). mTORCI negatively regulates catabolic pathways such as 
autophagy”. Deletion of KLHL22 activated autophagy, as determined 
by the increased LC3BII/I ratio (Fig. 2f and Extended Data Fig. 6e) 
and the induction of LC3B puncta (Fig. 2g and Extended Data Fig. 6f). 
KLHL22 deficiency also reduced cell size (Fig. 2h), a phenotypic 
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degradation (e) of DEPDCS. f, g, Knockdown of KLHL22 suppresses the 
ubiquitination (f) and degradation (g) of DEPDC5. h, KLHL22 interacts 
with DEPDCS in an amino-acid-dependent manner. i, KLHL22 partially 
locates on lysosomes under amino-acid-sufficient conditions. LAMP2, 
lysosome. Scale bar, 10 1m. j, Recombinant KLHL22 accumulates on 
purified lysosomes in response to amino acids. For gel source data, see 
Supplementary Fig. 1. 


characteristic of mTORCI inactivation”. Thus, KLHL22 is essential 
for the activation of mTORCland downstream events. 

To investigate whether KLHL22 plays an evolutionarily conserved 
role in TORCI signalling, we knocked out KLHL22 in mouse embry- 
onic fibroblast (MEF) cells, and showed that mTORC1 failed to 
accumulate on lysosomes and phosphorylate S6K1 upon amino acid 
stimulation (Fig. 3a, b and Extended Data Fig. 7a). Multiple genes are 
predicted as potential KLHL22 orthologues in C. elegans (Extended 
Data Fig. 7b). We screened these genes with the use of a reporter strain 
expressing GFP-tagged HLH-30, an orthologue of human transcrip- 
tion factor EB (TFEB) that translocalizes to the nucleus upon TORC1 
inhibition””*?, Knockdown of mel-26 or tag-53, but not any other gene, 
induced the nuclear accumulation of HLH-30 (Fig. 3c and Extended 
Data Fig. 7c). RNAi knockdown of T08A11.1, which encodes the 
orthologue of human DEPDC5, impaired HLH-30 nuclear localization 
in mel-26-deficient animals, but not in tag-53-deficient worms (Fig. 3d 
and Extended Data Fig. 7c). Downregulation of T08A11.1 also restored 
the developmental delay of mel-26-deficient worms (Extended Data 
Fig. 7d). More importantly, expression of codon-optimized human 
KLHL22 suppressed the nuclear accumulation of HLH-30 in mel-26 
mutant animals (Fig. 3e), suggesting that MEL-26 functions as the 
C. elegans orthologue of KLHL22 to regulate TO8A11.1 (Ce.DEPDC5). 

Inhibition of TORC1 has been linked with lifespan extension in 
several species**”’. To determine the physiological significance of 
KLHL22, we compared the lifespans of wild-type worms with those 
of mel-26 mutants. Deletion of mel-26 significantly extended worm 
lifespan (Fig. 3f). The prolonged lifespan of mel-26 mutants was 
suppressed by RNAi knockdown of T08A11.1 (Fig. 3g). 

The mTORC1 pathway is frequently deregulated in many cancer 
types*?. Deletions in GATORI1 subunits NPRL2 and DEPDCS have 
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Fig. 2 | KLHL22 is required for mTORC1 activation, autophagy and 
cell growth. a, KLHL22 depletion prevents amino-acid-stimulated 
lysosomal accumulation of mTORC1. Scale bar, 101m. b, Depletion of 
KLHL22 suppresses amino-acid-dependent degradation of DEPDC5 

and phosphorylation of S6K1. WT, wild-type. c, KLHL22 plays a role 
upstream of Rag GTPases. Rag?N: RagAS??_RagCST, dominant-negative 
form of Rag GTPases. d, FLAG-KLHL22 translocates to the cytosol upon 


been reported in lung and breast cancers, and two cases of glioblas- 
toma”®-3°. Overexpression of DEPDC5 reduced cell size (Extended 
Data Fig. 8a), whereas co-expression of KLHL22 restored it (Extended 
Data Fig. 8b). Furthermore, a stable HEK293T cell line with a high level 


Cell diameter (jum) 


stimulation with leucine but not glutamine. DAPI, nuclei. Scale bar, 10,1m. 
e, Deprivation of KLHL22 prevents leucine-induced but not glutamine- 
induced S6K1 phosphorylation. f, KLHL22 is required for amino-acid- 
induced suppression of autophagy. n = 3 biologically independent 
experiments. g, Depletion of KLHL22 induces formation of LC3B puncta. 
DAPI, nuclei. Scale bar, 10\1m. h, Depletion of KLHL22 reduces cell size. 
For gel source data, see Supplementary Fig. 1. 


of KLHL22 showed greatly altered growth morphology and increased 
anchorage-independent growth (Fig. 4a, b and Extended Data Fig. 8c). 
High-level expression of KLHL22 conferred increased sensitivity to 
rapamycin, an mTORCI inhibitor (Fig. 4b). 
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Fig. 3 | KLHL22 plays an evolutionarily conserved role in the regulation 
of DEPDCS. a, KLHL22 is required for amino-acid-induced lysosomal 
localization of mTOR in MEF cells. Scale bar, 101m. b, Deficiency of 
KLHL22 suppresses amino-acid-induced degradation of DEPDC5 

and phosphorylation of S6K1 in MEF cells. For gel source data, see 
Supplementary Fig. 1. c, RNA interference (RNAi) targeting mel-26 or 
tag-53 induces nuclear accumulation of HLH-30::GFP in C. elegans 


Days Days 


(arrows). Dashed boxes are expanded on right. d, RNAi targeting TO8A11.1 
suppresses nuclear accumulation of HLH-30 in mel-26-deficient, but not 
tag-53-deficient worms. e, Human KLHL22 prevents the nuclear accumulation 
of HLH-30 in mel-26-deficient worms. f, g, mel-26 mutant worms have 
prolonged lifespan (f) and RNAi of TO8A11.1 suppresses lifespan extension 
in mel-26 mutant worms (g). n = 2 biologically independent experiments. 
*** P< 0.001; log rank test of one replicate is shown. 
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Fig. 4 | KLHL22 acts as a potential oncogene in breast cancer. a, Protein 
levels of FLAG-KLHL22 in cells stably expressing HA-DEPDCS and 
FLAG-KLHL22. Green or red denotes low or high protein level of 
KLHL22. b, High levels of KLHL22 promote anchorage-independent 
growth and confer increased rapamycin sensitivity. n = 3 biological 
replicates. **P < 0.01; ***P < 0.001, two-sided Student's t-test. 

c, Upregulation of KLHL22 and downregulation of DEPDCS are detected 
in tumours of breast cancer patients. T, tumour; ANT, adjacent normal 
tissues; P1, pair 1. d, e, Depletion of KLHL22 inhibits anchorage- 


The ‘Oncomine’ database revealed several cancer types with elevated 
KLHL22 transcript levels (Extended Data Fig. 8d). Protein levels of 
KLHL22 were elevated in multiple breast cancer cell lines, such as 
MDA-MB-231, MDA-MB-468 and BT549, compared with the cor- 
responding control cell line MCF10A (Extended Data Fig. 8e). More 
strikingly, the level of KLHL22 protein was markedly increased in 
breast tumours of patients, in comparison with that of the adjacent 
normal tissues (Fig. 4c). The elevation of KLHL22 correlated strongly 
with a reduction in DEPDCS5 protein level in each tumour sample 
(Fig. 4c). KLHL22 mRNA level was also elevated in tumour samples 
(Extended Data Fig. 8f), suggesting that KLHL22 might be regulated 
at the transcriptional level in breast cancer. 

Breast cancer cells (for example, MDA-MB-231) displayed similar 
regulation of DEPDCS5 in response to amino acids (Extended Data 
Fig. 8g-i). Thus, to test whether KLHL22 promotes tumour growth, 
we knocked out KLHL22 in MDA-MB-231 and MDA-MB-468, two 
breast cancer cell lines with elevated KLHL22 levels (Extended Data 
Figs 9a—c). Depletion of KLHL22 suppressed amino acid-triggered 
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independent growth of MDA-MB-231 (d) and MDA-MB-468 (e) human 
breast cancer cells. f-h, Deletion of KLHL22 or treatment with rapamycin 
suppresses tumour growth of MDA-MB-231 cells. Tumour volume, 

mean + s.e.m. (n =9 mice for sgKLHL22_2, n= 10 mice for remaining 
groups), ***P < 0.001, two-sided ANOVA (f); tumour images (g); and 
tumour weights 20 d after transplantation, mean + s.e.m. (n = 8 mice 

for WT and WT injected with low dose of rapamycin, n= 6 mice for 
remaining groups), ***P < 0.001, two-sided Student's t-test (h). For gel 
source data, see Supplementary Fig. 1. 


degradation of DEPDC5 and phosphorylation of S6K1, and inhib- 
ited anchorage-independent growth of both cell lines (Fig. 4d, e and 
Extended Data Fig. 9d, e). Finally, we subcutaneously transplanted 
wild-type MDA-MB-231 and MDA-MB-468 cells, or KLHL22 knock- 
out cells, into immunodeficient nude mice. Depletion of KLHL22 
significantly prevented tumour growth (Fig. 4f-h, Extended Data 
Fig. 9f-h). Moreover, treatment with rapamycin suppressed tumour 
growth (Fig. 4f—h, Extended Data Fig. 9i-k). Collectively, these data 
demonstrate that KLHL22 functions as a potential oncogene to active 
mTORCI signalling and promote tumour growth in breast cancer. 

In response to amino acids, the inhibitory function of GATOR1 
must be released in order to activate mTORC1. The identification of 
CUL3-KLHL22 E3 ubiquitin ligase provides a novel mechanism for 
amino-acid-induced GATOR] inactivation. KLHL22 promotes ageing 
in C. elegans, and tumour growth in mice, adding to the physiologi- 
cal significance of this regulation (Extended Data Fig. 10). Strikingly, 
tumour samples from breast cancer patients all have elevated KLHL22 
protein levels, which is correlated with the reduction in DEPDC5 
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levels. Small molecules that inhibit KLHL22 activity on DEPDC5 are 
candidates for development as drugs for the treatment of breast cancer 
and age-related diseases. 
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METHODS 

Antibodies. Anti-HA (Santa Cruz 7392), anti-FLAG (Sigma F7425), anti-LAMP2 
(IF in HEK293T, Abcam 25631), anti-LAMP2 (IF in MEK, Abcam 13524), anti- 
mTOR (CST 2983), anti-LC3B (Sigma L7543), anti- Myc (ImmunoWay YM3203), 
anti-Ub (Santa Cruz 8017), anti-KLHL22 (Proteintech 16214-1-AP), anti-KLHL19 
(Proteintech10503-2-AP), anti- NPRL2 (Sigma SAB1305758), anti- NPRL3 (Sigma 
HPA011741), anti-DEPDC5 (Sigma SAB1302644), anti-p-S6K1 (CST 9205), 
anti-S6K1 (CST 9202), anti-PDI (Santa Cruz 20132), anti-Prohibitin (Santa Cruz 
28259), anti-EEA1 (CST 3288), anti-GAPDH (Abcam 128915), anti-Lamin B1 
(Bioworld MB2029), anti- Histone H3 (CST 3638), anti-Actin (Sigma A5441), and 
anti-14-3-3 (Santa Cruz 23957) were used. 

Cell lines and cell culture. MCF10A, MCF7, SKBR3, BT549, PWRIE, 22RV1, 
LNCAP and HSF cells were provided by G. Ouyang (Xiamen University); A375 
and MV3 cells were provided by Q. Wu (Xiamen University); MDA-MD-231, 
MDA-MB-468, PC3 and MEF cells were gifts from S.-C. Lin (Xiamen 
University). HEK293T cells were purchased from ATCC. None of the cell 
lines in this study appears in the misidentified cell line list kept by the ICLAC. 
Cell lines served in this study were not authenticated immediately before 
use in our laboratory. All cell lines were validated to be free of mycoplasma 
contamination. 

All cell lines were maintained at 37°C and 5% CO, with the exception of 
MDA-MB-231 and MDA-MB-468 cells, which were cultured without CO>. 
MCF1O0A cells were maintained in MEBM medium (Lonza) supplemented with 
its additives (without using GA-1000) and 100 ng/ml cholera toxin (Sigma). MCF7 
cells were maintained in MEM medium (Gibco) supplemented with 10% inacti- 
vated fetal bovine serum (FBS), penicillin and streptomycin (P/S). PC3, 22RV1, 
LNCAP and BT549 cells were kept in RPM1I1640 medium (Gibco) supplemented 
with 10% FBS and penicillin/streptomycin (P/S). PWRIE, A375, HSE, MV3, 
MEF and HEK293T cells and their derivatives were maintained in DMEM high 
glucose medium (Hyclone) supplemented with 10% FBS and P/S. SKBR3 cells 
were cultured in McCoy’s 5a medium (Gibco) supplemented with 10% FBS and 
P/S. MDA-MB-231 and MDA-MB-468 cells and their derivatives were kept in L15 
medium (Gibco) supplemented with 10% FBS and P/S. Except where indicated, 
data were generated using HEK293T cells. 

For amino acid (AA), leucine or glutamine starvation, cells were incubated 
in handmade DMEM base medium lacking AAs, leucine or glutamine for the 
indicated times. For AA, leucine or glutamine stimulation, cells were starved for 
50 min and then stimulated by directly adding AAs, leucine or glutamine high con- 
centration stock solution for the indicated times. For starvation and re-stimulation 
experiments in Fig. la and Extended Data Fig. 1f, 5a, dialysed FBS (Gibco, 
26400044) was supplemented. For starvation and stimulation experiments in the 
remaining figures, insulin (Sigma, 19278) was supplemented at 200 ng/ml. High 
concentration AA solution is a combination of commercial 50 x glutamine-free 
AA mixture (Gibco, 11130051) and 50 x glutamine (Gibco, 25030081). High 
concentration leucine and glutamine solutions were handmade. 

For drug treatments, MG132 dissolved in DMSO (Santa Cruz 201270) was pre- 
diluted with medium to 10j:g/ml. Medium containing MG132 was then used to 
replace the original medium and cells were cultured in the presence of MG132 
for 1h. When supplemented along with amino acids, MG132 was continuously 
present during starvation, or starvation and re-stimulation periods. Chloroquine 
(CQ) (Sigma C6628) or cycloheximide (CHX) (Sigma, C7698) dissolved in water 
was added directly into the culturing medium at a final concentration of 501M 
or 50,g/ml, respectively. 

Nematode strains and culture conditions. C. elegans strains (N2; EU1007: mel-26; 
MAH240: HLH-30::GFP) were obtained from the Caenorhabditis Genetics Center 
(CGC). N2 and HLH-30::GFP strains were cultured at 20°C. mel-26 mutants were 
maintained at 15°C. For lifespan analysis, temperature-sensitive mel-26 mutants 
and the corresponding controls were cultured at 25°C in the presence of fluo- 
rodeoxyuridine (FUDR). For RNAi-mediated gene knockdowns, HLH-30::GFP 
reporter worms were cultured at 20°C. 

Generation of KLHL19 or KLHL22 knock-out cells. CRISPR guide 
sequences targeting the second exon of KLHL19 or KLHL22 were designed 
by http://crispr.mit.edu and cloned into pBC2 CRISPR vectors that were provided 
by Y. Wang (Peking University). Sequences were as follows: sgsKLHL19_1_human: 
5! TTGGCATCATGAACGAGCTG 3; sgKLHL19_2_human: 5’ GGATGCACC 
GGCCGCCCAGT 3’; sgKLHL22_1_human: 5’ CACTGCGTGAACAACACCTA 
3’; sgKLHL22_2_human: 5’ GGACAGCGGAATCCTCTTCG 3’; sgDEP 
DC5_1_human: 5’ GTGTTCCCTCACATCAAGCT 3’; sgDEPDC5_2_human: 
5! AGGATCAGTATATTGGCCGT 3’; sgsKLHL19_1_mouse: 5’ TCGCAGGACGG 
TAACCGAAC 3’; sgKLHL19_2_mouse: 5’ GGGACGCAGTGATGTATGCC 
3’; sgKLHL22_1_mouse: 5’ ATCGGATTCTGCTAGCTGCA 3’; sgKLHL22_2_ 
mouse: 5‘ GATCCTCTTTGACGTTGTCC 3’-HEK293T, MEF, MDA-MB-231 
or MDA-MB-468 cells were cultured in 6-well plates and transfected with the 
corresponding CRISPR vector. Cells were trypsinized 48 h later and 250\1g/ml 


hygromycin was added into the culturing medium. Hygromycin-resistant cells 
were then sorted into 96-well plates and validated by genotyping. 

Generation of stable cell lines. HEK293T stable cell lines were generated using 
a lentiviral system. The following lentiviral expression constructs were used: 
pBOBE-HA-DEPDCS, pBOBE-Myc-Ub, pBOBE-Myc-K48Ub, pBOBE-FLAG- 
KLHL22 and pLJM1-LAMP1-mRFP-FLAG**. The pBOBE vector was a gift from 
S.-C. Lin (Xiamen University). HEK293T cells in 6-well plates were transfected 
with the plasmids indicated above, together with pMDLg/pRRE, pRSV-Rev and 
pCI-VSVG. Medium was changed 8h after transfection. Cells were then cultured 
for an additional 24h to allow virus production (virus packaging). HEK293T 
cells were cultured in virus-containing medium supplemented with 8 g/ml 
polybrene for 24h (virus infection). G418 was added to the culturing medium 
for selection. 

Immunoprecipitation. Cells were lysed on ice for 40 min using TX-100 
lysis buffer (20 mM Tris-HCl, 150 mM NaCl, 1mM Na,EDTA, 1mM EGTA, 
2.5mM sodium pyrophosphate, 1 mM 6-glycerophosphate, 1% Triton X-100, 
pH 7.4) supplemented with EDTA-free protease inhibitors (Roche). Cell lysates 
were then centrifuged at 12,000 r.p.m. for 10 min at 4°C. For anti-FLAG and 
anti-HA immunoprecipitation, supernatants of cell lysates were supplemented 
with washed FLAG or HA affinity gels (Sigma/Pierce) and rotated at 4°C 
for 2h. Immunoprecipitates were then washed three times with TX-100 
lysis buffer. Whole cell lysates and immunoprecipitates were analysed by 
immunoblotting. 

For transfection-based experiments, HEK293T cells were plated in 6-cm dishes 

and transfected with pcDNA3.3-based expression vectors using polyethylenimine 
(PEI) transfection reagent. 48h after transfection, cells were subjected to treatments 
and then processed as described above. 
Ubiquitination assays. HEK293T cells with or without the expression of exoge- 
nous proteins were lysed on ice for 10 min with RIPA lysis buffer (20mM Tris-HCl, 
150mM NaCl, 1mM Na,EDTA, 1 mM EGTA, 1% NP-40, 1% sodium deoxycho- 
late, 2.5 mM sodium pyrophosphate, 1 mM {-glycerophosphate) containing 1% 
SDS supplemented with EDTA-free protease inhibitors (Roche). Cell lysates were 
briefly sonicated, boiled at 100°C for 10 min, and then centrifuged at 12,000 r.p.m. 
for 10 min at 4°C. Supernatants were then diluted 1:3 with RIPA lysis buffer to 
reduce the concentration of SDS. For ubiquitination assay of HA-tagged DEPDC5, 
pre-washed HA affinity gels were incubated with diluted lysates for 2h at 4°C. 
For ubiquitination assay of endogenous DEPDCS5, anti-DEPDCS5 antibodies 
were added into diluted lysates for overnight binding. Protein G beads were 
then incubated with lysates at 4°C for 1h to enrich anti-DEPDCS antibodies. 
Immunoprecipitates were washed three times with RIPA lysis buffer containing 
0.1% SDS and analysed by immunoblotting. 

For knockdown experiments, siRNAs were transfected using the 
RNAiMAX transfection reagent (Thermo). 48h after transfection, cells 
were subjected to ubiquitination assays. Sequences of siRNAs were as fol- 
lows: siKLHL19_1: 5’ GGGAGTACATCTACATGCATT 3’; siKLHL19_2: 5’ 
GAGTGTTACGACCCAGATA 3’; siKLHL22_1: 5’ CAGGCTACGTGCACATTTA 
3’; siKLHL22_2: 5’ GCTCAACAACTTCGTATAC 3’. 

For in vitro ubiquitination assays, 2.5 11 GST-CUL3/RBX1 (BPS Bioscience) 

mixed with 2 1g FLAG-KLHL19 or FLAG-KLHL22 purified from HEK293T 
cells was used. The E3 ligase complex was mixed with 2 j1g HA~-DEPDCS puri- 
fied from HEK293T cells, 15 1g wild-type ubiquitin or K48R ubiquitin, 550 ng 
E1 (UBE1), 850ng E2 (UBE2D2), 12.5mM Mg-ATP and 1 x ubiquitin reac- 
tion buffer, and incubated at 37°C for 1h. The above reagents except for the 
annotated ones were purchased from Boston Biochem. Reactions were termi- 
nated by boiling at 100°C with SDS sample buffer for 10 min and analysed by 
immunoblotting. 
RNA extraction and quantification. Cells or worms were lysed with TRIzol 
Reagent. Total RNA was isolated by chloroform extraction and isopropanol 
precipitation. One migrogram total RNA was used for reverse transcrip- 
tion using a cDNA synthesis kit (TransGen Biotech). cDNAs of each gene 
were quantified by PCR or real-time PCR (qPCR). For PCR, amplifications of 
cDNAs were resolved by agarose gel. For qPCR, quantifications of transcripts 
were normalized to GAPDH. The following primers were used: DEPDCS: 
5'ACCAGACTGTGACTCAAGTG and 5’‘ATAGGCACATGTGCTGACC; 
GAPDH: 5'ACCACAGTCCATGCCATCAC and 5’TCCACCACCC 
TGTTGCTGTA; ACT: 5'ATCTGGCACCACACCTTCTAC and 5’GGATA 
GCAACGTACATGGCTG; KLHL19: 5'GTGCTGTCATGTACCAGATC 
and 5'GGTTGAAGAACTCCTCTTGC; KLHL22: 5'GAGAGTGGAAGC 
ACTTCACTG and 5‘GCGTAGATGTACCTGCCTACA; mel-26: 5'CGAGC 
TGTTACGTACATCCTG and 5/AGTCCAGATGGAGGTGGTAG; tag-53: 
5'CGCTTTGATCACACAGTTGTC and 5’/TCACATGAGCTGAGTGACCTG; 
TO8A11.1: 5'TCTGTGCTCACGGTGGTATG and 5/CGAATTGAACCTGT 
GGAAGC; rpl-32: 5’AGGGAATTGATAACCGTGTCCG and 5/TAGGA 
CTGCATGAGGAGCATGT. 
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Nuclear/cytosolic fractionation. After amino acid treatment, cells in 6-well 
plates were lysed on ice for 10 min with 2001] per well NP-40 lysis buffer (10 mM 
Tris-HCl, 150mM NaCl, 0.05% NP-40) supplemented with EDTA-free protease 
inhibitors (Roche). Samples were centrifuged at 5,000 r.p.m. for 5 min at 4°C to 
isolate the pellets (nuclei) and supernatants (cytosol). Nuclear and cytosolic frac- 
tions were boiled with SDS sample buffer and then analysed by immunoblotting. 
Purification of recombinant proteins. Recombinant human HA-DEPDC5, 
FLAG-KLHL22, FLAG-KLHL19 and HA-KLHL22 were immunopurified from 
HEK293T cells. For purification of each protein, 10 x 15-cm dishes of HEK293T 
cells transfected with corresponding expression vectors were lysed on ice for 
40 min using TX-100 lysis buffer supplemented with EDTA-free protease inhibi- 
tors (Roche). Cell lysates were then centrifuged at 12,000 r.p.m. for 10min at 4°C. 
Supernatants were mixed with 300,11 HA or FLAG affinity gels and rotated at 4°C 
for 2h. Immunoprecipitates were washed three times with TX-100 lysis buffer 
and three times with PBS, and eluted in 500,11 PBS containing 100j1g/ml HA or 
FLAG peptides. Purified proteins were analysed by Commassie G250 staining 
and immunoblotting. For storage, purified proteins supplemented with 20% (w/v) 
glycerol were kept at —20°C. 

Silver staining and mass spectrometry. To identify interaction proteins of 
DEPDCS5 or KLHL22, 5 x 15-cm dishes of HEK293T cells stably expressing 
HA-DEPDCS5 or FLAG-KLHL22 were lysed with TX-100 lysis buffer containing 
EDTA-free protease inhibitors (Roche). Cell lysates were centrifuged at 12,000 
r.p.m. for 10min at 4°C. HA or FLAG affinity gels were added to supernatants for 
immunoprecipitation. Immunoprecipitates were denatured by boiling in SDS sam- 
ple buffer and resolved by SDS-PAGE. Silver staining was carried out according 
to the manufacturer’s instructions (Sigma, PROTSIL2). For interaction partners 
of HA-~DEPDCS, bands of interest were cut from the gel and analysed by mass 
spectrometry; for FLAG-KLHL22, the whole lane was analysed. 
Immunostaining. HEK293T cells were cultured on polylysine-coated glass covers- 
lips (Corning) in 12-well plates (70,000 cells per well). After amino acid treatment, 
cells on coverslips were washed once with PBS, fixed with 4% paraformaldehyde 
in PBS, washed again with PBS and permeabilized with 0.1% Triton X-100 in PBS. 
Cells were then washed, and blocked in PBS containing 5% BSA for 30 min at 
room temperature. The coverslips were then incubated in PBS containing primary 
antibodies (1:200 dilutions) at 4°C overnight. Cell were then rinsed, and incubated 
with secondary antibodies in PBS (1:200 dilutions) for 2h at room temperature. 
After washing, coverslips were mounted on slides using a mounting buffer con- 
taining DAPI (Thermo) and imaged on Zeiss Fluorescence Microscopes (Imager 
M2 for Fig. 3c-e; LSM 710 confocal microscopy for other Figures). 

In vitro reconstitution assay. For immunopurification of lysosomes, HEK293T 
cells stably expressing LAMP1-mRFP-FLAG”? were used. For each sample, cells 
in one 10-cm dish were pelleted through centrifugation at 1,000 r.p.m. for 3 min 
at room temperature. Cell pellets were washed once with fractionation buffer 
(50mM KCl, 90mM K-gluconate, 1mM EGTA, 5mM MgCh, 50mM sucrose, 
5mM glucose, 20mM HEPES, pH 7.4; 2.5mM ATP and protease inhibitors were 
added immediately before use), resuspended in 0.8 ml fractionation buffer, and 
then mechanically broken by spraying six times through a 23 G needle attached 
to a 1-ml syringe. Cell fractions were spun down at 2,000g for 10 min at 4°C to 
pellet the nuclei, yielding a post-nuclei supernatant (PNS). The PNS was adjusted 
to 2 ml, supplemented with 5011 anti-FLAG affinity gels and rotated at 4°C for 2h 
to enrich lysosomes. 

Immunopurified lysosomes were washed and resuspended in 300 1l fractiona- 
tion buffer supplemented with 250,1M GTP and 100|.M GDP, in the presence or 
absence of 1 x amino acids. Lysosomes were then rotated at 650 r.p.m. on a thermo- 
mixer for 15 min at 37°C (activation step). The reaction system was subsequently 
supplemented with 20 ul purified HA-KLHL22 and rotated for additional 25 min 
(binding step). Immunoprecipitates were washed, denatured and then analysed 
by immunoblotting. 

In vitro binding assay. HEK293T cells in 10-cm dishes were transfected with 
vectors expressing HA~-DEPDC5, HA~-DEPDC5-5KR or HA-DEPDC7. 48h after 
transfection, cells were lysed in 2 ml ice-cold TX-100 lysis buffer containing EDTA- 
free protease inhibitors (Roche). Cell lysates were centrifuged at 12,000 r.p.m. for 
10 min at 4°C. Supernatants were saved, supplemented with 301] anti-HA affinity 
beads, and rotated at 4°C for 2h. Immobilized HA-DEPDC5, HA~-DEPDC5-5KR 
or HA-DEPDC7 was washed once in TX-100 lysis buffer, followed by three washes 
in TX-100 lysis buffer supplemented with 500 mM NaCl. TX-100 lysis buffer was 
then replaced with 88 tl binding buffer (40 mM HEPES, 2mM EGTA, 2.5mM 
MgCh, 0.3% CHAPS, pH 7.4). For in vitro binding assay, 88 jl immobilized 
HA-DEPDC5, HA~-DEPDC5-5KR or HA-DEPDC7 was incubated with 12 yl 
immunopurified FLAG-KLHL22 and supplemented with 1% BSA and 2mM DTT 
at 4°C for 1h. Samples were washed three times in binding buffer and subjected 
to immunoblotting. 

Determination of cell size. Cells at the confluence of ~4 x 10° per ml were sub- 
jected to cell diameter determination using an easy cell analyser (CountStar). 
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HLH-30::GFP nuclear translocation assay. HLH-30::GFP reporter strain was 
crossed with mel-26 mutants to generate mel-26, HLH-30::GFP. HLH-30::GFP 
or mel-26; HLH-30::GFP reporter strains subjected to the indicated treatments 
were photographed using a Zeiss Imager M2 microscope. Representative images 
are shown. 

RNAi in C. elegans. RNAi clones from the Ahringer Library were grown in LB 
containing 50 1g/ml carbenicillin at 37°C overnight and then seeded onto worm 
plates with IPTG. Dried plates were kept at room temperature overnight to allow 
IPTG induction of dsRNA expression. Worms were then seeded onto RNAi plates. 
Knockdown efficiencies were measured using qPCR. 

Lifespan analysis. To compare the lifespan of mel-26 mutants with that of N2 
worms, about 100 L4 stage (day 0) N2 or mel-26 mutant worms were transferred 
to plates containing E. coli OP50 and FUDR to prevent reproduction, and cultured 
at 25°C. For lifespan analysis of mel-26 mutants with TO8A11.1 knockdown, about 
100 L4 stage mel-26 worms were transferred to plates containing bacteria HT115 
expressing dsRNAs, and cultured at 25°C. Worms were transferred to new plates 
and counted every other day. FUDR was omitted after day 12. Animals that did not 
move when gently prodded were scored as dead. Animals that crawled off the plate 
or died owing to vulva bursting were not included. Sample size was determined by 
reference to the literature. Worms were chosen in an unbiased fashion for experi- 
mental analysis to ensure randomization. The investigators who collected the data 
were blinded to genotype and treatment. 

Anchorage-independent growth assays. For the bottom-layer agarose medium, a 
1:1 mixture of 1% low melting point agarose (Sigma) and pre-warmed 2 x culture 
medium (prepared from powdered DMEM or L15 medium, Gibco) was added 
to 6-well plates (1.5 ml per well) and solidified at 20°C for 30 min. The 1:1 mixed 
0.6% low melting point agarose and cells in 1 x culture medium with or without 
100 nM rapamycin were poured onto the solidified bottom medium and allowed to 
solidify at 20°C for 30 min, forming the top-layer agarose medium. Concentrations 
of HEK293T cells and MDA-MB-231/468 cells in 1 x medium were 6,500 cells 
per ml and 2,500 cells per ml. One-hundred microlitres per well of 1 x culture 
medium was added twice weekly onto the top medium to prevent drying. Cells 
were cultured at 37°C with 5% CO; for around two weeks for adequate colony 
formation. Colonies were stained with 0.005% crystal violet in 5% methanol and 
quantified using Image]. 

Tumour transplantations. Trypsinized MDA-MB-231 or MDA-MB-468 cells were 
washed twice with PBS and concentrated to 10° per 100,11 in PBS. Cell suspensions 
were then mixed with equal volumes of Matrigel (Corning). Two-hundred microli- 
tres of cells mixed with matrigel were then loaded into a 1-ml insulin syringe (BD) 
and subcutaneously injected into the right back flank of an 8-week-old female 
BALB/c nude mouse. Rapamycin (LC Laboratories) in ethanol at 10 mg/ml was 
diluted in 5% Tween-80 (Sigma) and 5% PEG-400 (sigma). Treatment was con- 
ducted by intraperitoneal injection of 1.5 mg (low dose) or 4.5 mg (high dose) 
rapamycin every other day, starting at day 0 (the day of transplantation) or day 
6. Xenografts were measured with a caliper every other day after transplantation 
(tumour volume = width? x length x 0.523). Tumour size must not exceed 20mm 
at the largest diameter in an adult mouse, according to the ACUC and IRB of 
Beijing Laboratory Animal Research Center (BLARC). None of the experiments 
exceeded this limit in our study. Mice were killed when tumours reached 15 mm at 
the largest diameter or ulceration was evident. BALB/c nude mice were purchased 
from Charles River Laboratories and kept in a specific-pathogen-free facility at 
BLARC. Animals were maintained in accordance with institutional guidelines. 
We complied with all relevant ethical regulations of IACUA. Sample size was 
determined according to literatures. Mice were chosen in an unbiased fashion for 
experimental analysis to ensure randomization. The technicians who collected the 
data were blinded to genotype and treatment. 

Patient samples. Breast cancer patient samples were all from Peking University 
Cancer Hospital, which owns a large breast cancer Biobank. Fresh breast tumour 
samples were obtained from patients and stored at —80°C for further use. The 
tumours were all stage I-III. Tumour stage was classified according to the 
tumour-node-metastasis (TNM) classification of the Union International Cancer 
Control. This study was conducted in accordance with the ethics principles of 
the Declaration of Helsinki and approved by the Research and Ethics Committee 
of Peking University Cancer Hospital. All patients provided written informed 
consent. 

Statistics and reproducibility. Unless otherwise stated, no statistical methods 
were used to predetermine sample size, the experiments were not randomized, and 
the investigators were not blinded to allocation during experiments and outcome 
assessment. The epidemiological data for association between transcript levels 
of KLHL22 and each cancer type, and the related statistical significances, were 
provided by the Oncomine platform. For graphs with error bars or statistical sig- 
nificance, details of reproducibility and statistics are indicated in the corresponding 
figure legends. All statistical analyses in our study were conducted using GraphPad 
Prism 5. For other graphs showing representative data, reproducibility are stated 
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below: (1) n > 3 biologically independent experiments for: Figs. la-j; 2a—e, g | Reporting summary. Further information on experimental design is available in 
(statistical analysis in Extended Data Fig. 6f, h); 3a—e; 4a, d, e; Extended Data _ the Nature Research Reporting Summary linked to this paper. 

Figs. 1a, b (immunoblotting), c—f; 2a, d; 3b-e; 4a—d, f-h; 5a, b, d, f-h; 6b-d; 8a-c, _ Data availability. Source Data are available in the online version of the paper. 
e, g, h (immunoblotting), i; 9c-e. (2) n =3 technically independent experiments _ The authors declare that all data supporting the findings of this study are available 
for Fig. 4c. (3) n= 1 experiment for Extended Data Figs. 2b, c; 5e, i. within the paper and its Supplementary Information files. 
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Extended Data Fig. 1 | DEPDC5 undergoes ubiquitin-mediated 
degradation in response to amino acids. a, Protein levels of endogenous 
DEPDC5 are regulated in response to the availability of amino acids. Basal, 
standard culture condition. b, Amino acids do not affect mRNA levels of 
DEPDCS5. Relative mRNA levels of DEPDC5 to GAPDH were quantified 
by qPCR. n=3 biologically independent experiments. ns, no significant 
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difference; two-sided Student's t-test. c, d, DEPDC5 proteins are unstable 
under normal conditions. CHX, cycloheximide. e, Proteasome inhibitor 
MG132 increases the protein levels of DEPDC5. f, The ubiquitination of 
DEPDCS5 is regulated in an amino-acid-dependent manner. For gel source 


data, see Supplementary Fig. 1.Source Data. 
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Extended Data Fig. 2 | CUL3-KLHL22 mediates the ubiquitination of 
DEPDCS. a, KLHL22, but not other E3 ligases, promotes the degradation 
of DEPDC5. b, DEPDCS interacts with CUL3 and RBX1. Cells stably 
expressing HA~DEPDCS5 were subjected to anti-HA immunoprecipitation. 
Arrows indicate protein bands on silver staining that were analysed 

by mass spectrometry. c, Purification of recombinant HA-DEPDC5, 


in vitro ubiquitination assay 


FLAG-KLHL19 or FLAG-KLHL22 proteins in HEK293T cells. d, CUL3- 
KLHL22 E3 ligase catalyses DEPDC5 ubiquitination in a cell-free system. 
Immunopurified HA-DEPDCS5 and recombinant wild-type (WT) or 
mutant (K48R) ubiquitin were incubated with recombinant CUL3-RBX1 
and KLHL22 or KLHL19. Asterisk represents HA-DEPDCS. For gel source 
data, see Supplementary Fig. 1. 
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Extended Data Fig. 3 | KLHL22 promotes the ubiquitination of 
DEPDCS5 on multiple lysine residues. a, Schematic depicting the 
predicted ubiquitination sites of DEPDCS5. b, c, The 5KR mutation 


d, Single mutation of each lysine residue does not impair K48-linked 
ubiquitination (d) and degradation (e) of DEPDCS. For gel source data, 
see Supplementary Fig. 1.Source Data. 


prevents K48-linked ubiquitination (b) and degradation (c) of DEPDCS. 
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from DEPDCS in response to amino acid starvation. d, Recombinant each serine, threonine or tyrosine residue in DEP domain to alanine does 
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motif prevents KLHL22-mediated degradation of DEPDCS. h, Mutation of 
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Extended Data Fig. 5 | 14-3-3 regulates nuclear-cytosolic shuttling 

of KLHL22. a, b, Nuclear-cytosolic shuttling of KLHL22 is regulated by 
amino acids. DAPI, nuclei. Scale bar, 10 jm. c, Schematic depicting the 
reported phosphorylation sites of KLHL22. d, $18A mutation prevents 
nuclear accumulation of KLHL22 in amino-acid-deficient conditions. 
Scale bar, 10 jum. e, Schematic depicting the interaction between FLAG- 
KLHL22 and 14-3-3 in mass spectrometry analysis using FLAG-KLHL22 


as the bait. f, S18A mutation prevents the interaction between KLHL22 
and 14-3-3 in amino-acid-deprived conditions. g, KLHL22(S18A) mutant 
promotes S6K1 phosphorylation in amino-acid-deficient conditions. 

h, Immunopurification of lysosomes. LAMP2 (lysosome), EEA1 (early 
endosome), prohibitin (mitochondria), PDI (endoplasmic reticulum) and 
histone H3 (nucleus). i, Purification of HA-KLHL22. For gel source data, 
see Supplementary Fig. 1. 
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Extended Data Fig. 6 | KLHL22 is essential for activation of mTORC1 
and downstream events. a, Schematic depicting CRISPR-Cas9-mediated 
knockout of KLHL19 or KLHL22 in HEK293T cells. b, KLHL22 is required 
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Extended Data Fig. 7 | KLHL22 has a conserved role in mice and 


nematodes. a, Schematic depicting CRISPR-Cas9-mediated knockout of 


KLHL19 or KLHL22 in MEF cells. b, Predicted orthologues of KLHL22 
in C. elegans. c, Knockdown efficiency of tag-53, mel-26 and T08A11.1. 


n= 2 biologically independent experiments. d, RNAi targeting TO8A11.1 


Q-PCR: tag-53 mel-26 TO8A11.1 Bear Ctrl tag-53 = mel-26 


suppresses the developmental delay of worms induced by mel-26 RNAi, 
but not tag-53 RNAi. n= 3 biologically independent experiments. ns, no 
significant difference; *P< 0.05; **P< 0.01; ***P < 0.001; two-sided 
Student's t-test. 
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Extended Data Fig. 8 | Protein levels of KLHL22 are elevated in human 
breast cancer. a, Stable expression of HA-DEPDCS5 reduces the size 

of HEK293T cells. b, Stable expression of FLAG-KLHL22 reverses the 
reduction in cell size in HEK293T cells stably expressing HA-DEPDCS5. 
c, High expression of FLAG-KLHL22 transforms HEK293T cells. 

d, According to Oncomine database, mRNA levels of KLHL22 are 
elevated in breast and prostate cancers and in melanoma. Fold change 

of KLHL22 transcript levels in tumour tissues (T) to the corresponding 
normal tissues (NT) are shown. Rank represents KLHL22 rank in ordered 
list of genes that are upregulated. e, Protein levels of KLHL22 in breast 
cancer, prostate cancer and melanoma cell lines and corresponding 
normal cells (MAF10A, PWRIE and HSF). f, mRNA levels of KLHL22 
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*P < 0.05; **P < 0.01; ***P < 0.001; two-sided Student's t-test. g, Protein 
levels of endogenous DEPDCS5 are regulated in an amino-acid-sensitive 
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does not affect mRNA levels of DEPDC5 in MDA-MB-231 cells. Relative 
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Supplementary Fig. 1. 
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Extended Data Fig. 9 | Deletion of KLHL22 in breast cancer cells 
prevents mTORC1 activation and tumorigenesis. a, b, Schematic 

of CRISPR-Cas9-mediated knockout of KLHL19 or KLHL22 in 
MDA-MB-231 (a) or MDA-MB-468 (b) breast cancer cells. c, Protein 
levels of KLHL22 are upregulated in MDA-MB-231/468 cells. 

d, e, Deletion of KLHL22 in MDA-MB-231 (d) or MDA-MB-468 (e) 
cells suppresses amino-acid-induced DEPDC5 degradation and S6K1 
phosphorylation. f-h, Deletion of KLHL22 suppresses tumour growth 
of MDA-MB-468 cells. Tumour volume, mean + s.e.m. (1 =9 mice for 
WT, n=8 mice for sgKLHL22_1), ***P < 0.001, two-sided ANOVA 
(f); tumour images (g); tumour weights 20 days after transplantation, 


mean +s.e.m. (n= 6 mice for WT and n= 8 mice for KLHL22 knockouts), 
* P < (0,001, two-sided Student’s t-test (h). i-k, Deletion of KLHL22 or 
treatment with rapamycin suppresses tumour growth of MDA-MB-231 
cells. Tumour volume, mean + s.e.m. (n =9 mice for sgKLHL22_2 and WT 
injected with low dose of rapamycin, n = 10 mice for remaining groups), 
*** P< 0.001, two-sided ANOVA (i); tumour images (j); and tumour 
weights 20 days after transplantation, mean + s.e.m. (n= 8 mice for WT 
and WT injected with low or high dose of rapamycin; n = 6 mice for 
KLHL22 knockouts), ***P < 0.001, two-sided Student's t-test (k). For gel 
source data, see Supplementary Fig. 1.Source Data. 
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Extended Data Fig. 10 | Proposed model of KLHL22-mediated KLHL22-mediated DEPDCS5 degradation is a conserved mechanism of 
regulation of mTORC1, ageing and cancer. In response to amino acids, TORCI1 regulation in mammals and nematode. TORC1 hypoactivation 


KLHL22 translocates from the nucleus to the cytosol, where it accumulates due to mel-26 (Ce.KLHL22) depletion extends C. elegans lifespan, whereas 
on the surface of lysosomes to mediate the ubiquitination and degradation | TORC1 hyperactivation due to elevated expression of KLHL22 promotes 
of DEPDCS5, an essential subunit of GATOR1. GATOR] loss of function human breast cancer (bottom). 

activates Rag GTPases and subsequently activates mTORCI (top). 
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Structural basis for gating pore current in periodic 


paralysis 


Daohua Jiang)®, Tamer M. Gamal El-Din®, Christopher Ing?’, Peilong Lu), Régis Pomés?, Ning Zheng!** & William A. Catterall!* 


Potassium-sensitive hypokalaemic and normokalaemic periodic 
paralysis are inherited skeletal muscle diseases characterized by 
episodes of flaccid muscle weakness!”. They are caused by single 
mutations in positively charged residues (‘gating charges’) in the $4 
transmembrane segment of the voltage sensor of the voltage-gated 
sodium channel Na,1.4 or the calcium channel Ca,1.1!. Mutations 
of the outermost gating charges (R1 and R2) cause hypokalaemic 
periodic paralysis!” by creating a pathogenic gating pore in the 
voltage sensor through which cations leak in the resting state>*. 
Mutations of the third gating charge (R3) cause normokalaemic 
periodic paralysis® owing to cation leak in both activated and 
inactivated states®. Here we present high-resolution structures of 
the model bacterial sodium channel Na,Ab with the analogous 
gating-charge mutations”®, which have similar functional effects 
as in the human channels. The R2G and R3G mutations have no 
effect on the backbone structures of the voltage sensor, but they 
create an aqueous cavity near the hydrophobic constriction site that 
controls gating charge movement through the voltage sensor. The 
R3G mutation extends the extracellular aqueous cleft through the 
entire length of the activated voltage sensor, creating an aqueous 
path through the membrane. Conversely, molecular modelling 
shows that the R2G mutation creates a continuous aqueous path 
through the membrane only in the resting state. Crystal structures 
of Na,Ab(R2G) in complex with guanidinium define a potential 
drug target site. Molecular dynamics simulations illustrate the 
mechanism of Na* permeation through the mutant gating pore in 
concert with conformational fluctuations of the gating charge R4. 
Our results reveal pathogenic mechanisms of periodic paralysis 
at the atomic level and suggest designs of drugs that may prevent 
ionic leak and provide symptomatic relief from hypokalaemic and 
normokalaemic periodic paralysis. 

Nay1.4 channels generate action potentials that initiate muscle con- 
traction’. They are complexes of a pore-forming a-subunit and auxiliary 
81 subunits®"'!. The «-subunit contains four homologous domains 
(I-IV), each with six transmembrane segments (S1-S6). Segments S1- 
S4 form the voltage sensor, and every third residue in $4 is positively 
charged. Upon depolarization, S4 moves outward through a narrow 
gating pore formed by S1-S3, catalysed by interactions with negative 
or polar residues in S2 and $3’”. The voltage sensor has an hourglass 
shape, with a narrow hydrophobic constriction site (HCS) that sep- 
arates extracellular and intracellular compartments®!!. Water-filled 
crevices on either side of the HCS focus the membrane electric field, 
assuring efficient coupling of voltage to conformational changes that 
open the central pore!*'?, Mutations in the arginine gating charges that 
occupy the HCS cause state-dependent cation leak through the voltage 
sensor, which we term ‘gating pore current’!*'>. 

Missense mutations of arginine gating charges in $4 of Nayl.4 
cause hypokalaemic periodic paralysis and normokalaemic periodic 
paralysis)?!617, Mutations of R1 in domains I or III to H or Q, or muta- 
tion of R2 in domains I, I] and III to W, G, Q or S cause hypokalaemic 
periodic paralysis?!°!7, Mutations of R3 in domain II to G, Q or W, 


or of R3 in domain III to H or C cause normokalaemic periodic paraly- 
sis*!*!7, All these mutations result in non-selective gating pore current 
through the voltage sensor**!*’*. Increased inward leak leads to Nat 
overload, sustained depolarization and action potential failure, which 
paralyze skeletal muscles*!*'°. These pathophysiological effects sug- 
gest that mutations that cause hypokalaemic periodic paralysis result in 
an open aqueous pathway for ion movement in the resting state of the 
voltage sensor, but not in the activated state, and mutations that cause 
normokalaemic periodic paralysis result in an open aqueous pathway 
in the activated state, but not in the resting state. Molecular models 
and mutagenesis studies support this hypothesis””-**. To provide 
direct structural evidence for this pathophysiological mechanism, we 
introduced mutations known to cause periodic paralysis into Na,Ab, 
a voltage-gated Nat channel from Arcobacter butzleri, the structure of 
which has been solved at high resolution®’. We characterized the result- 
ing gating pore currents, solved the structures of mutant gating pores 
without and with a bound permeant ion, and investigated molecular 
dynamics” of ion movement through the gating pores. 

To reconstitute pathogenic hypokalaemic periodic paralysis gating 
pore currents in NayAb, we mutated R2 to S (R2S, analogous to 
Nay1.4(R672S)) and expressed the mutant in Trichopulsia ni insect cells. 
Transfected cells were voltage-clamped to —200mV and depolarized in 
10-mV steps to record Na* currents. Half-maximal activation of central 
pore currents was observed at V,= —105+0.6 mV (Fig. 1a). To measure 
gating pore currents, cells were held at —100 mV, at which NayAb is in 
the slow-inactivated state and exhibits no central pore current. Gating 
pore current was examined by applying pulses from + 100 to —-200mV 
in —10mV steps. A nonlinear leak current component was observed 
in the resting state, beginning at —110mV and increasing to —200 mV 
(Fig. 1b, c). 

Mutations of the gating charge R3 that cause normokalaemic 
periodic paralysis (Nay1.4(R675G/Q/W)) induce outward gating 
pore current in activated but not in resting states®. In Na,Ab(R3G), 
central pore current was activated between —50mV and 0mV 
(Fig. 1d; Vi = —24.8+1.1 mV). Steady-state inactivation was 
observed from —90 mV to —10 mV with half maximal inactivation 
at V, = —47.7£0.4mV (Fig. 1d). Na,Ab(R3G) conducted outward 
gating pore current in both activated and inactivated states at poten- 
tials more positive than —60 mV (Fig. le, f). These physiological 
studies demonstrate that NayAb provides an accurate model of Nayl.4, 
because gating pore current is observed only in the resting state for 
Na,Ab(R2S) and only in the activated and inactivated states for 
Na,Ab(R3G). 

The pathogenic effects of gating pore mutations depend on inward 
leak of Na*. The R2S mutant gating pore was not significantly selective 
among Cs*, K* or Na* (Fig. 1g, P> 0.7). As is the case for Nay1.4”4, the 
gating pore of NayAb(R2S) was exceptionally permeant to guanidinium 
(about 28-fold greater than Na*), but it was less permeant to methyl- 
guanidinium and ethylguanidinium (Fig. 1g). The outward gating pore 
currents conducted by NayAb(R3G) were higher for Cs than for K* or 
Na*, which were similar to each other (Fig. 1h). However, NayAb(R3G) 
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Fig. 1 | Functional properties of NayAb(WT), NayAb(R2S) and 
Na,Ab(R3G). a, Central pore Na‘ currents (inset) and conductance- 
voltage (G-V) curve for NayAb(R2S) during 200-ms depolarizations from 
—200 mV to the indicated potentials. V, (the voltage for half-maximal 
activation) = —105 + 0.6 mV; slope factor, k= 10+0.9. n=4. b, c, Gating 
pore Na* currents (gp) and current-voltage (I-V) curves for NayAb(R2S) 
(blue) or NayAb(WT) (black) during test pulses from —100 mV to the 
indicated potentials. n = 10. d, Central pore Na* currents (inset) and 

G-V curve for NayAb(R3G) during depolarizations from —160 mV to 

the indicated potentials (filled circles; V,= —24.8+1.1mV,k=9+1; 
n=4). Voltage dependence of steady-state inactivation (open circles) 


was less permeant to guanidinium than to Na* (Fig. 1h), and it was 
more than 16-fold less permeant to guanidinium than NayAb(R2S). The 
weak selectivity of R2S and R3G mutants for different inorganic cations 
and the high guanidinium permeability through the R2S mutant are 
characteristic of the corresponding mutations in Nay1.4, further 
supporting the validity of NayAb as a model for structural studies of 
gating pore mutations. 
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for NayAb(R3G) (V_ = —47.7 £0.4mV, k=7.5+0.3 (n= 4); Vp, the 
voltage for half-maximal inactivation)). e, f, Gating pore Na* currents 
and I-V curves for NayAb(R3G) (red) or NayAb(WT) (black) for voltage 
steps from 0 mV to the indicated potentials. n = 11. g, Gating pore 
current through NayAb(R2S) for Cs* (1=5), Kt (n=7), Nat (n=5), 
N-methyl-p-glucamine (NMDG, n=5), guanidinium (G, n=7), 
methylguanidinium (MG, n=5) and ethylguanidinium (EG, n=5) at 
—200 mV. ***P = 0.00029. h, Gating pore current through Na,Ab(R3C) 
for Cst (n= 4), Nat (n=6), K* (n=6), guanidinium (n= 4) and NMDG 
(n=4) at +100 mV. **P=0.0011. Student’s t-test, two-sided. 


To elucidate the structure of a pathogenic gating pore in its con- 
ductive conformation in an activated voltage sensor, we solved 
the structure of a NayAb analogue of a normokalaemic periodic 
paralysis-causing mutation, NayAb(R3G), at 2.7A resolution (Fig. 2). 
Voltage-gated sodium channels have a central pore module surrounded 
by four symmetrically located voltage sensors (Fig. 2a). The voltage 
sensors of NayAb and Nay1.4 are very similar in amino acid sequence 


Fig. 2 | Structures of the voltage sensor of 


2 Na,Ab(WT) and Na,Ab(R3G). a, Structure 
¥ 14h of NayAb(R3G) in top view. b, Comparison of 
$9 as4 the conformations of NayAb(WT) (grey) and 
NayAb(R3G) (rainbow) voltage sensor in side 
R3i we view. c-e, Structures of NayAb(WT) voltage 
HCS Bee 
*s fe-83 sensor. ¢, Side view highlighting gating charges 
R4 


S1 in sticks. d, Top view in space-filling format. 
e, MOLE2 analysis of water-accessible space 
in magenta. f-h, Structures of Na,yAb(R3G) 
voltage sensor. f, Side view highlighting gating 
charges. g, Top view in space-filling format. 

h, MOLE2 analysis of water-filled space in 
magenta. Green spheres in f and h indicate 

the positions of the missing side chain of 

R3. Ind and g, the dotted red line circles the 
position where the gating pore would be in the 
activated state and the solid red line circles the 
open gating pore, respectively. See Extended 
Data Table 1 for details. 


Na Ab(R3G) 
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and structure (Extended Data Figs. 1 and 2). The voltage sensors of 
wild-type (WT) NayAb and NayAb(R3G) crystallize in the same con- 
formation, with a Ca root mean square deviation (r.m.s.d.) of 0.39 A 
(Fig. 2b, Extended Data Fig. 3). These results indicate that the R3G 
mutation does not perturb the overall structure of the voltage sensor 
and, therefore, that its pathogenic effects are caused by the loss of the 
R3 side chain. These channels crystallize with an activated voltage 
sensor®’ (Fig. 2c), as would be expected at 0 mV. In NayAb(WT), R1, 
R2 and R3 are located extracellularly relative to the HCS, and their 
side chains point outward, toward the extracellular milieu (Fig. 2c). By 
contrast, R4 is located intracellularly relative to the HCS and its side 
chain points inward towards the cytosol (Fig. 2c). When viewed from 
the extracellular side, there is no water-accessible path into the cell 
through the wild-type voltage sensor (Fig. 2d); however, we observed 
a deep solvent-accessible cleft extending down to the R4 side chain in 
Na,Ab(R3G) (Fig. 2g). 

Analysis of the structure of chain B of NayAb(WT) using the MOLE2 
algorithm revealed an incomplete water-accessible path extending part 
of the way through the voltage sensor from both extracellular and intra- 
cellular sides, which is interrupted at the HCS by R3 (Fig. 2e). Strikingly, 
in NayAb(R3G), the water-accessible path continues all the way 
through the voltage sensor, and has a diameter of 2 A at its narrowest 
point, similar to the size of Na* (Fig. 2h). By contrast, in chain A, R4 
was captured in a rotamer conformation in which the arginine side 
chain partially blocks the inner end of the gating pore in NayAb(R3G) 
(Extended Data Fig. 4a). Previously reported structures of NayAb in 
the slow-inactivated state show that R4 adopts four slightly different 
rotamer conformations, with the most open having a diameter of 3 A> 
(Extended Data Fig. 4b). These results elucidate the molecular mecha- 
nism by which mutations in $4 cause pathogenic gating pore currents 
and suggest that ion permeation through the gating pore is controlled 
dynamically by the state of the voltage sensor and by rotamer confor- 
mations of R4. 

In contrast to voltage-gated sodium-channel mutations that cause 
normokalaemic periodic paralysis, those that cause hypokalaemic 
periodic paralysis result in a channel that conducts gating pore current 
in the resting state but is closed in the activated state (Fig. 1). Therefore, 
we hypothesized that NayAb(R2G) would not have a continuous water- 
accessible path through its gating pore in the activated state. Analysis 
of the 2.9 A structure of NayAb(R2G) revealed a gap with additional 
solvent-accessible area in the extracellular aqueous cleft in comparison 
to the wild-type channel, but no change in the backbone conformation 
(Fig. 3a, Extended Data Fig. 3). Although the increased opening of the 
aqueous cleft in the voltage sensor is evident in space-filling models 
(Fig. 3b), the R3 and R4 side chains seal the voltage sensor in this acti- 
vated state, interrupting the transmembrane path and preventing ion 
conductance. The solvent-accessible area penetrates about 21 A into the 
membrane from the extracellular side (Fig. 3c), more than 7A deeper 
than in NayAb(WT) (Fig. 2e), but it does not reach the cytosolic side. 
This structure illustrates why NayAb(R2G) does not conduct gating 
pore current in the activated state (Fig. 1). 

There are no crystal structures of the voltage sensor of a voltage-gated 
sodium channel in the resting state, because the resting state is only 
accessible at negative membrane potentials. However, we developed 
models of three resting states using disulfide locking of substituted 
cysteine residues and structure prediction with the Rosetta algorithm”*, 
these are now considered consensus models of the actual resting 
states””*®, To model an open gating pore with the voltage sensor in the 
resting state, we introduced the R2G and R3G mutations into these rest- 
ing-state models and analysed the resulting structures with the MOLE2 
algorithm (Fig. 3d—f). There is no continuous path through the voltage 
sensor in the wild-type resting-state structure (Fig. 3d), whereas the 
resting state of the NayAb(R2G) voltage sensor contains a continuous 
water-accessible path through the membrane (Fig. 3e). Loss of the R2 
side chain leaves a gap at the HCS that is large enough for Na‘ to pass 
through (Fig. 3e). By contrast, the transmembrane pathway is incom- 
plete in NayAb(R3G) because the R2 side chain occupies the HCS and 
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Fig. 3 | Structure of voltage sensor and guanidinium binding site 

of NayAb(R2G). a-c, Structures of the activated voltage sensor of 
Na,Ab(R2G). a, Side view with gating charges highlighted in sticks. b, Top 
view in space-filling format. The dashed red line indicates the position of 
the closed gating pore. c, MOLE2 analysis of water-filled space in magenta. 
d-f, Rosetta structural models of resting state 2 of the voltage sensor were 
re-optimized with the amino-acid sequence of NayAb for NayAb(WT) 

(d), NayAb(R2G) (e) and NayAb(R3G) (f). The perspective is rotated 
approximately 180° around the vertical axis to better illustrate the arginine 
gating charges in resting state 2. Green spheres represent missing arginine 
side chains of R2 and R3, respectively. Magenta blobs represent solvent- 
accessible volume modelled with MOLE2. g, Top view of NayAb(R2G) with 
one guanidinium bound to each voltage sensor. h, 2mF)—DF, electron 
density map (blue mesh) of residues around the guanidinium binding site 
at 1o. i, Interaction network between guanidinium and amino acids in 

the voltage sensor of NayAb(R2G). Grey dashed lines show interatomic 
distances shorter than 4 A. See Extended Data Table 1 for details. 


blocks the gating pore (Fig. 3f). These structural models illustrate how 
R2 charge mutations that cause hypokalaemic periodic paralysis result 
in gating pore current in the resting state. 

The Na,Ab(R2S) mutant channel is much more permeant than 
Na,Ab(R3G) to guanidinium ions” (Fig. 1). Guanidinium ions are 
chemically similar to the distal moiety of the arginine side chain, and 
guanidine compounds with hydrophobic substituents can block mutant 
gating pores”*. We probed our gating pore structures for guanidinium- 
binding sites by soaking crystals of NayAb(R2G) and Na,Ab(R3G) 
with guanidinium and methylguanidinium to determine whether 
they would bind in place of the missing side chain of R2 or R3. The 
crystal structures did not show guanidinium binding to Na,Ab(R3G). 
However, crystals of Na,Ab(R2G) soaked with guanidinium or methyl- 
guanidinium diffracted to 2.7 A and 2.5 A resolution, respectively, and 
unambiguous electron density was observed in place of each R2 side 
chain (Fig. 3g-i, Extended Data Fig. 5a, b). Bound guanidinium is 
clearly seen in 2F,—F, maps (Fig. 3h). E32 and M29 from S1, N49 
from §2, R1 and R3 from S4, and Q150 from an adjacent subunit form 
the binding site for guanidinium (Fig. 3i). M29 and R3 each bind gua- 
nidinium through hydrogen bonds (Fig. 3h, i). The carbonyl group of 
E32 and the carbonyl oxygen of R1 further lock guanidinium in place 
(Fig. 3h, i). The binding site is flanked by hydrogen bonds from N49 
and Q150 that stabilize guanidinium from opposite sides (Fig. 3h, i). 
The binding site for methylguanidinium is almost identical (Extended 
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Data Fig. 5c, d). These structures capture guanidinium bound at a 
specific site in the closed R2G gating pore. The amino acid residues 
responsible for guanidinium binding are highly conserved in Na,Ab, 
Na,1.4 and Cay1.1 (Extended Data Fig. 1). Substituted guanidinium 
ions can block gating pore current without major effects on Nay1.4 
function”, suggesting that guanidinium-containing compounds 
specific for this binding site could provide a basis for structure-based 
drug design and be used therapeutically to relieve the symptoms of 
hypokalaemic periodic paralysis. 

To examine relationships among structural fluctuations of the gating 
pore, ionic hydration and Na* leakage, we performed molecular 
dynamics simulations of wild-type and R3G mutant voltage sensors 
in a hydrated lipid bilayer (Fig. 4). Multiple unbiased simulation 
repeats, with a total duration of 30 us, show that the overall struc- 
tures are conserved. Analysis of axial distributions of water mole- 
cules revealed a narrow region (—5 A<z<5A) that is more hydrated 
in the R3G mutant than in the wild-type voltage sensor, owing to the 
larger size of the lumen in the mutant (Fig. 4a—c, yellow; P < 0.002, 
see Extended Data Table 2). The average count of water mole- 
cules within the HCS was 3.9 + 0.8 and 5.3 +0.4 for the wild type 
and R3G mutant, respectively (Fig. 4e). We performed umbrella- 
sampling simulations to compute the free energy of Na* permeation 
along the principal axis of the voltage sensor. When Na* was within 
the HCS, the number of water molecules in the HCS increased to 
8.4+0.3 in the wild type and 9.0 + 0.3 in the R3G mutant, respec- 
tively. The free-energy profile for Na* translocation forms a broad 
barrier spanning the HCS, centred at Ca of R3. The R3G mutation 
significantly decreases the height of this barrier from 18 +0.8 to 
11+1.4kcal mol7! (Fig. 4e). These values are consistent with the 
undetectable gating-pore conductance in the wild type and an upper 
limit of around 0.1 pS in the R3G mutant”. Analysis of ionic coor- 
dination shows that, at the extracellular edge of the barrier, the first 
solvation shell of Na* is almost exclusively composed of water, con- 
sistent with the hydrophobic nature of the bottleneck in the voltage 
sensor (Fig. 4f). The total coordination number of 5.81 + 0.02 in 
bulk water drops to 4.88 + 0.04 at the peak of the free-energy barrier, 
suggesting a large desolvation penalty for Na* that is partly alle- 
viated by the cavity created in the absence of the R3 side chain. 
Charge-charge repulsion is also likely to contribute substantially 
to the higher energy barrier to Na* permeation in the wild type, 
impeding the gating pore leakage observed in the R3G mutant. 

The location of R4 coincides with a secondary shoulder in the 
free-energy profiles (Fig. 4d, R108), indicating that movement of Nat 
past R4 is not rate-limiting for permeation, even though transit of Nat 
past R4 causes the largest displacement of water by protein ligands 
(Fig. 4e). Spontaneous disruption of the R4-E59 salt bridge in 3+1% 
of simulation frames for the wild type and R3G mutant opens the inner 
end of the gating pore with sufficient frequency to support gating pore 
current (Fig. 4g—i). Na* often makes direct contacts with the anionic 
side chains of D80 and E59 (Fig. 4f, g), and its movement is coupled to 
dynamic rearrangements of the R4 salt-bridge network. 

Overall, our results provide an unprecedented high-resolution view 
of functional effects of ion channel mutations that cause periodic 
paralysis and define the structural basis for pathogenesis of this ion 
channelopathy. R2G and R3G mutations do not perturb the backbone 
structure of the voltage sensor, suggesting that the aberrant gating pore 
currents are not caused by conformational changes in transmembrane 
alpha helices. Instead, the absence of the positively charged R2 and R3 
side chains opens an aqueous gating pore that allows diffusion of Na* 
into the cell, depending on the functional state of the voltage sensor. 
Our structural studies show how this pathogenic gating pore current is 
gated in resting and activated states by transmembrane movements of 
the S4 segment. Although our studies of R2G and R3G mutants suggest 
a straightforward explanation for the pathogenic gating pore current, 
mutations that cause hypokalaemic periodic paralysis and normokal- 
aemic periodic paralysis that substitute large side chains such as tryp- 
tophan also cause gating pore currents*!®!’, perhaps by perturbing 
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Fig. 4 | R3G mutation lowers the free-energy barrier for Nat 
conductance. a, Probability (prob.) distribution of water along the domain 
axis for NayAb(WT) (black) and Na,yAb(R3G) (red). b, c, Representation 
of voltage sensor from NayAb(WT) and NayAb(R3G) simulations 

where Na* (blue sphere) is restrained at z= —5 A. The $2 segment 
(residues 45-65) is omitted for clarity. Arrows indicate the positions 

of R108. d, Axial distribution of gating charge Ca for NayAb(WT) and 
NayAb(R3G). The axial position in the crystallographic structure is 

shown as a vertical line. e, Inset, probability distribution of water in the 
HCS (—5 Ato 5A) across all simulations of Na,Ab(WT) (black) and 
NayAb(R3G) (red). The total probability is separated into frames where 
Na* occupies the hydrophobic constriction (solid) or is outside this region 
(cross-hatched). Nwat, number of water molecules. Main panel, Potential 
of mean force for Nat conduction within the NayAb(WT) (black) and 
NayAb(R3G) (red) pore computed using umbrella sampling. The HCS is 
highlighted in yellow. f, Average coordination of Na* as a function of ionic 
position along the principal axis of the voltage sensor, for NayAb(WT) 
(solid lines) and NayAb(R3G) (dashed lines). The first coordination shell 
of Na* is partitioned for coordination to protein (green), water (blue), 
lipid head groups (orange) and counterions (purple). g-i, Representative 
snapshots from simulations of NayAb(R3G) depicting conformational 
isomerization of R4. *P < 0.002, n = 60; see Extended Data Table 2 for 
details. Arrow in f indicates the point of minimum coordination of Na* 

by water. 
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the local structure of the voltage sensor and thereby opening a pore 
across the membrane. 

Our structures reveal the binding pose of a highly permeant ion, 
guanidinium, in the closed gating pore of the activated voltage sensor 
of NayAb(R2G). Substituted guanidinium derivatives can block gating 
pore current without impairing voltage sensor function in Nayl.4**. 
Therefore, our high-resolution structural models may provide mole- 
cular templates for design and development of drugs that would mimic 
guanidinium, block gating pore current and provide symptomatic relief 
of periodic paralysis. 


Online content 

Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
018-0120-4. 
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METHODS 


Electrophysiology. All experiments were performed using T. ni insect cells (High 
Five Cells, Thermofisher). Molecular biology and patch-clamp measurements were 
performed as described previously!>°, All constructs showed high level expression 
that enabled us to measure ionic current and gating pore currents 48h after infection. 
Whole-cell sodium currents were recorded using an amplifier (Axopatch 200; 
Molecular Devices) with glass micropipettes (2-4 MQ). The intracellular pipette 
solution contained (mM): 35 NaCl, 105 CsF, 10 EGTA and 10 HEPES, pH 7.4 
(adjusted with CsOH). The extracellular solution contained (mM): 140 NaCl, 2 
CaCl, 1.8 MgCl, and 10 HEPES, pH 7.4 (adjusted with NaOH). 

For Na,Ab(R2S), the standard clamp protocol for measuring central pore currents 
consisted of steps from a holding potential of —200 mV to voltages ranging from 
—180 to OmV in 10mV steps. For NayAb(R3G), cells were held at —160 mV and 
10mV voltage steps ranging from —140 mV to +50 mV were applied. A P/—10 
or P/—4 leak-subtraction protocol was used to subtract linear leak and capacitive 
currents from holding potentials of —200 or —160 mV, respectively. 

To measure gating pore currents in Na,Ab(R2S), cells were held at —-200 mV 
for ~1 min to allow recovery from slow inactivation. Then, the cells were held at 
—100 mV for gating pore current measurements, which inactivates the central 
pore current. Depolarizing pulses in 10-mV steps were applied from —200 mV 
up to +50 mV. The intracellular pipette solution contained (mM): 140 CsE, 10 
EGTA, and 10 HEPES, pH 7.4 (adjusted with CsOH). The extracellular solution 
contained (mM): 140 NaCl, 2 CaCl, 1.8 MgCl, and 10 HEPES, pH 7.4 (adjusted 
with NaOH). To test gating pore selectivity for different cations, NaC] was replaced 
by an equimolar concentration of KCl, CsCl, LiCl, NMDG or 40 mM guanidinium 
sulfate +100 mM NMDG, 40 mM methylguanidinium sulfate +100 mM NMDG, 
or 40 mM ethylguanidinium sulfate +100 mM NMDG. 

To measure gating pore currents in NayAb(R3G), cells were held at 0mV for 
a few min to induce slow inactivation. Then, 10-mV pulses were applied from 
—200mV up to +50 mV. To measure ion selectivity of R3G, the composition of 
external solution was in (mM): 140 NMDG-MS, 2 CaCl, 10 HEPES. The intracel- 
lular solution contains either 140 mM NaF, 140mM KF or 140 mM Csf in addition 
to 10mM HEPES, 10mM EGTA. 

No online leak subtraction protocols were used during measuring of gating pore 
currents. Linear leak subtraction was done offline by generating a linear fit to the 
I-V curves at voltage ranges +100 mV to 0mV for NayAb(R2S) and between 
—200mV and 0 mV NayAb(R3G). Voltage-clamp pulses were generated and currents 
were recorded using Pulse software controlling an Instrutech ITC18 interface 
(HEKA). Data were analysed using Igor Pro 6.37 software (WaveMetrics). 
Protein purification and crystallization. R2G, R2S or R3G mutations were intro- 
duced into NayAb/1217C by site-directed mutagenesis (QuikChange; Agilent) and 
confirmed by sequencing. Protein was expressed and purified as described®. In 
brief, recombinant baculovirus was generated by using the Bac-to-Bac system 
(Invitrogen), and T. ni cells (High Five Cells, Thermofisher) were infected for 
protein production. Protein was extracted with 1% digitonin (EMD Biosciences). 
After centrifugation, the supernatant was agitated with anti-Flag M2-agarose resin 
(Sigma). Flag resin was washed and eluted with Flag peptide, and the purified 
protein was analysed by SDS-PAGE (Extended Data Fig. 6). Purified protein was 
then loaded onto a Superdex 200 column (GE Healthcare) in 10 mM Tris-HCl 
pH 8.0, 100mM NaCl and 0.12% digitonin. The peak fraction was concentrated 
to ~17mg ml™' and reconstituted into DMPC:CHAPSO (Anatrace) bicelles. The 
protein-bicelle preparation was mixed in a 1:1 ratio and set in a hanging-drop 
vapour-diffusion format over a well solution containing 1.8-2.0 M ammonium 
sulfate, 100mM Na-citrate pH 4.8-5.2. Crystals grew to full size in a week. Crystals 
were cryoprotected in well solution supplemented with 28% glucose (w/v) in incre- 
ments of 7% glucose during harvesting. Guanidinium- or methylguanidinium- 
bound crystals were cryoprotected by soaking in the same cryoprotection solution 
plus 10mM guanidinium or methylguanidinium ions. Crystals were plunged into 
liquid nitrogen for data collection. 

Data collection and structure determination. X-ray diffraction data was collected 
at Advanced Light Source (beamlines BL821 and BL822), and then integrated and 
scaled with the HKL2000 suite. Both NayAb(R2G) and NayAb(R3G) structures 
were solved by Phaser-MR using NayAb (PDB code: 3RVY) monomer as searching 
model. After initial phases, models were refined with PHENIX*! and manually 
re-built using COOT*. High-resolution density maps clearly showed no side-chain 
density for R2G or R3G. Simulated annealing omit maps were used to confirm the 
binding of guanidinium ions. The geometries of the final models were verified 
using MolProbity*’. All solvent-accessible volume analysis in the voltage-sensing 
modules was generated with MOLE2™. 

Molecular modelling and dynamics. Molecular models of the NayAb(WT) and 
Na,Ab(R3G) channels were constructed using the NayAb(I217C) structure (PDB 
code: 3RVY)°. The latter model was generated by substituting R105 with G in 
all four voltage sensing domains. Both systems were embedded in a hydrated 
1,2-dimyristoyl-sn-glycero-3-phosphatidylcholine (DMPC) bilayer with ~250 mM 


NaCl for a total of ~129,000 atoms. Embedding was performed using the alchem- 
bed protocol? using an equilibrated rectangular CHARMM36 DMPC bilayer patch 
obtained from the Klauda laboratory website (https://terpconnect.umd.edu/~ 
jbklauda/). The protein, lipids and ions were modelled with the CHARMM36 
all-atom force field**-** and water molecules were modelled with TIP3P**. NBFIX 
adjustments were made for Na*-backbone carbonyl O atom and Na‘ -lipid head 
group interactions!?4). 

All simulations were performed with GROMACS 5.0.6”. Electrostatic interactions 
were calculated using particle-mesh Ewald algorithm‘? with a real-space cut-off 
distance of 1.2nm, a grid spacing of 0.16 nm and cubic interpolation. Lennard- 
Jones interactions were cut off at 1.2nm. Nonbonded interactions were calculated 
using Verlet neighbour lists“. All simulations were performed at constant temper- 
ature (300K) using the Nosé-Hoover thermostat***” with temperature coupling 
of 0.5 ps and at constant pressure (1 atm) with the Parrinello-Rahman barostat**? 
with a time constant of 2 ps. All chemical bonds were constrained using the LINCS 
algorithm’. The integration timestep was 2 fs. 

Because the channel and voltage sensor were initially devoid of water mol- 
ecules and ions, a protein-restrained equilibration period of 30 ns was used to 
reduce the systematic sampling bias induced by the initial conditions (10 ns with 
protein heavy-atom restraints, 10 ns with backbone restraints, and 10 ns with Ca 
restraints, all with a force constant of 2.39 kcal mol~! A~?). Unbiased production 
simulations of 15 replicas of ‘WT’ and ‘R3G’ systems were conducted for 1,000 ns 
each, resulting in aggregate sampling of 151s for each tetramer (4 x 15 pps = 601s 
for WT and R3G voltage sensors). 

Simulation snapshots beyond t= 100 ns were extracted from unbiased sim- 
ulations and used as initial conditions for biased simulations, using the entire 
tetramer. Umbrella sampling*’” was used to compute the free energy or potential 
of mean force (PMF) profile for the movement of Na* through voltage sensing 
domain. The range of the reaction coordinate, —2.0 to 2.0 nm with respect to 
the centre of the hydrophobic constriction, was discretized into ~130 unevenly 
spaced windows. For each window, biased simulations were initiated with a water 
molecule exchanged for Na* in all four voltage sensors. Production simulations 
were performed for 70-100 ns per window with a harmonic restraining potential 
force constant of 2.39 kcal mol! A~? and a flat-bottom cylindrical position 
restraint for all four Na* ions simultaneously. The axial position of the permeating 
Na* ion, z, was stored every 10 fs and the data from each of the four voltage 
sensors were used separately to generate four independent PMF profiles using 
g_wham*’, enforcing cyclic periodicity of the PMF in the bulk (at z= —2.5nm). 
The initial 10 ns were excluded from each umbrella sampling run. We report the 
mean PMF over the four voltage sensors with error bars computed using the 
standard error of mean over all four PMFs. The total simulation time for each of 
the two systems (WT and R3G) was ~11 is, yielding a total of ~45 1s of voltage 
sensor data. 

Water occupancy of the voltage sensor was computed by counting the number 
of water oxygen atoms within a cylinder of radius 8.0 A. We define the hydrophobic 
constriction centre as the geometric centre of Ca atoms of residues 22, 57, 84 and 
105. The range of the HCS is defined as —5 A to 5 A along the axial coordinate of 
the voltage sensor. Coordination of Na* to channel ligands, water, ions and lipids 
was performed by computing the number of protein, water and lipid O atoms, 
as well as Cl~ ions, within the first solvation shell of Nat (<3.0 A). The average 
coordination number at a given axial position was computed over all simulation 
frames regardless of the subunit, but the total coordination number in bulk water 
and at the hydrophobic constriction reported in the text was based on the mean and 
standard error of mean over the four voltage sensors. Analysis of the trajectories 
was performed using MDTraj*™ and molecular renderings were generated using 
Visual Molecular Dynamics™. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. Coordinates and structure factors have been deposited in the 
Protein Data Bank with the following accession numbers: NayAb(R3G), 6C1E; 
NayAb(R2G)-guanidinium, 6C1K; NayAb(R2G)-methylguanidinium, 6C1M; 
NayAb(R2G), 6C1P. 
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Extended Data Fig. 1 | Sequence alignment of the voltage sensor 

of NayAb with those of human Na,1.4 homologous domain (D)II, 
Na,1.4 DIV, Ca,1.1 DIT and Ca,1.1 DIV. Coloured rectangles represent 
transmembrane helices. Black arrows indicate residues that form 

the guanidinium binding site, blue arrows indicate the hydrophobic 
constriction site and red arrows indicate the conserved intracellular 
negative cluster. 
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Extended Data Fig. 2 | Superposition of the Na,Ab(WT) voltage sensor (orange) and EeNay1.4 voltage sensor DIV (PDB code: 5XSY) (grey) in 
and the Electrophorus electricus (electric eel) Nay1.4 DIV voltage sensor. _ side view and top view, respectively. Arginine sensors and hydrophobic 
a-b, Comparison of the conformations of NayAb(WT) voltage sensor residues in the HCS are labelled and shown with side chains in sticks. 
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Extended Data Fig. 3 | Superposition of the voltage sensors of between NayAb(WT) (grey) and NayAb(R2G) (cyan) in side view and top 
Na,Ab(WT) and mutant channels. a—b, Voltage sensor structure view, respectively. Arginine sensors and hydrophobic residues in the HCS 
alignment between Na,Ab(WT) (grey) and NayAb(R3G) (green) in side are labelled and shown with side chains in sticks. 

view and top view, respectively. c-d, Voltage sensor structure alignment 
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Chain A 
Chain B 


Extended Data Fig. 4 | R4 side chain conformational changes. rotamer in the four subunits of NayAb in the slow-inactivated state (PDB 
a, Different conformations of the R4 rotamer in NayAb(R3G) chain code: 4EKW). 
A (green) and chain B (orange). b, Different conformations of the R4 
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Extended Data Fig. 5 | Electron density maps for bound guanidinium b, Overlay of guanidinium binding site (green) and methylguanidinium 
and methylguanidinium ions. a, 2mF,—DF, electron density map (blue binding site (orange). c-d, Simulated annealing map (F,—F,) contoured at 
mesh) of residues around the methylguanidinium binding site at 1c. 30 for methylguanidinium and guanidinium, respectively. 
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Extended Data Fig. 6 | Purification of NayAb(R3G). a, Representative were concentrated for crystallization. b, Concentrated sample was 


gel-filtration chromatography of NayAb(R3G); highlighted peak fractions visualized on SDS-PAGE by Coomassie blue staining. 
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Extended Data Table 1 | Data collection and refinement statistics 


NayAb/R3G NayAb/R2G NayAb/R2G NayAb/R2G 
Guanidinium Methyl Apo 
Guanidinium 
Data collection 
Space group 1222 1222 1222 P2 22) 


Cell dimensions 


a, b, c (A) 126.8, 127.0, 192.3 126.6, 126.6, 191.8 126.3, 126.2, 191.6 125.5, 125.6, 192.0 

a, By (°) 90, 90, 90 90, 90, 90 90, 90, 90 90, 90, 90 
Wavelength (A) 0.99994 0.99994 0.99994 0.99994 
Resolution (A) 50-2.90 (3.00-2.90) | 50-2.70 (2.80-2.70) | 50-2.55 (2.64-2.55) | 50-2.80 (2.90-2.80) 
Rpim 4.6 (62.6) 4.0 (62.0) 3.9 (64.0) 5.3 (58.1) 
oI 16.6 (1.5) 18.5 (1.2) 18.5 (1.0) 14.5 (0.8) 
Completeness (%) 100 (99.9) 99.6 (96.5) 99.4 (95.0) 98.0 (81.6) 
Redundancy 7.3 (7.2) 7.1 (5.4) 5.3 (3.8) 5.1 (3.2) 
Refinement 
Resolution (A) 42.50-2.86 42.31-2.70 42.31-2.52 48.46-2.90 
No. reflections 35059 41173 51039 67766 
Rwork! Rjree 21.25/23.99 20.98/24.59 20.31/22.66 23.35/26.03 
No. atoms 

Protein 3606 3605 3673 7160 

Ligand/ion 512 449 660 415 

Water 0 ) 35 0 
B-factors 

Protein 108.7 97.8 103.1 112.89 

Ligand/ion 128.2 107.5 130.8 115.8 

Water 54.5 75.5 
R.m.s deviations 

Bond lengths (A) | 0.010 0.010 0.009 0.012 

Bond angles (°) 1.311 1.215 1.253 1.703 
Ramachandran plots 
Favored 93.2% 92.5% 94.0% 92.1% 
Allowed 6.8% 7.3% 5.4% 7.4% 
Outliers 0.0% 0.2% 0.6% 0.5% 
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Extended Data Table 2 | Statistical analysis of voltage sensor water occupancy from molecular simulations 


Axial Interval (A,A) t-statistic avalie 
-20,-19 1.414 2.221E-01 
-19,-18 2.878 1.024E-02 
-18,-17 4.389 7.674E-05 
-17,-16 4.802 1.696E-05 
(-16,-15) 4.545 4.465E-05 
(-15,-14) 4.249 1.148E-04 
(-14,-13) 2.181 5.422E-02 
(13,12) 0.740 5.420E-01 
(-12,-11 0.217 8.721E-01 
(-14,-10) 2.760 1.276E-02 
(10-9) 4.283 1.078E-04 
(-9,-8) -0.110 9.364E-01 
(-8,-7) 1.914 9.279E-02 
(-7,6) -5.668 4.626E-07 
(-6,-5) -5.839 2.674E-07 
(5,-4) -9.376 6.032E-15 
(-4,-3) -12.075 9.500E-21 
(3,-2) -11.945 9.674E-21 
(-2,-1) -10.018 2.422E-16 

(-1,0) -5.812 2.674E-07 
60+60-2 0,1) -7.910 1.027E-11 
(1,2) 1.488 1.993E-01 
(2,3) 1.813 1.073E-01 
(3,4) 2.797 1.205E-02 
(4,5 5.497 9.078E-07 
(5,6) 2.476 2.672E-02 
(6,7) -3.688 8.593E-04 
(7,8 -8.074 5.198E-12 
(8,9 -3.257 3.462E-03 
(9,10) “1,302 2.522E-01 
(10,11) 1.220 2.809E-01 
(11,12) 0.482 7.211E-01 
(12,13) -2.870 1.024E-02 
(13,14) 1.919 9.279E-02 
(14,15) -0.003 9.978E-01 
(15,16) 0.887 4.569E-01 
(16,17) 1.843 1.044E-01 
(17,18) 1.319 2.522E-01 
(18,19) 0.263 8.686E-01 
(19,20) 0.249 8.686E-01 
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Results of two-way t-tests on differences in average water count in 1A segments of the voltage sensor axial coordinate comparing wild-type and R3G mutant simulations. In each segment, we compare 
the mean of 60 values (n= 60, obtained from pooling the mean water counts of the four voltage sensor proteins from each of 15 simulation repeats). The HCS region (—5 to 1A, bold) has the largest 
effect size, indicating a region of biological significance. 
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TECHNOLOGY FEATURE 


SINGLE-CELL APPROACHES 
TO IMMUNE PROFILIN 


Protein- and sequencing - based technologies are helping 
researchers to profile immune cells ever more deeply. 


Analysing individual white blood cells can help scientists to identify their specific role in fighting disease. 


BY ESTHER LANDHUIS 


he human immune system is a vast, 

decentralized army. With billions of 

specialized cells in constant motion, it’s 
incredibly difficult to work out which ones are 
doing what in any given place or situation. But 
advances in proteomics and genomics tech- 
nologies are helping researchers to catalogue 
the various parts. It is a massive undertaking 
that is likely to produce 100 billion times more 
data than the Human Genome Project. 

For decades, immunologists have relied 
heavily on flow cytometry, a technique that 
involves labelling cells with different fluores- 
cent markers. Over the past eight to nine years, 
mass cytometry, which uses metal ions to label 


the cells, has offered more-detailed glimpses 
into immune responses to diseases such as 
cancer, tuberculosis and malaria. 

Yet these techniques barely scratch the 
surface in determining the precise work of 
T cells and B cells. These cells are adorned 
with unique protein molecules — known as 
T-cell receptors and B-cell receptors — that 
are exquisitely designed to “recognize specific 
targets and respond to them in an evolution- 
ary way’, says Jennifer Sims, a molecular biolo- 
gist at the Memorial Sloan Kettering Cancer 
Center in New York City. Immune cells are 
almost as varied as people are, she says. 

These surface receptors recognize spe- 
cific molecular features, or antigens, on 
pathogenic organisms or tumours. When a 


receptor detects a harmful antigen, it triggers 
the immune cell to multiply and mobilize for 
attack. Every receptor is distinct, encoded by 
combinations of gene segments that are shuf- 
fled and recombined as the immune cell devel- 
ops. Last month, at the annual meeting of the 
American Association for Cancer Research 
in Chicago, Illinois, Sims chaired a session on 
methods for analysing the T-cell repertoire — a 
set of disease-fighting cells that churn out up 
to 10° unique receptors over a lifetime (see ‘A 
focus on immune receptors’). 

By studying these receptors — and their 
B-cell analogues — researchers hope to 
learn how the immune system responds and 
evolves during disease. Whereas flow cytom- 
etry can reveal the coarse details of how > 
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» different cell populations rise and fall, the 
immune-receptor profiling methods can pin- 
point the B-cell and T-cell ‘clones’ involved. To 
study these specialized receptors and further 
expand the power of single-cell analyses, a 
wave of sequencing-based methods is rippling 
through the field, says Alex Shalek, a chemi- 
cal physicist at the Massachusetts Institute of 
Technology (MIT) in Cambridge, whose lab 
builds tools to study how cells interact in healthy 
and diseased states. Just as DNA’s four ‘letters’ 
can encode any gene in the human body, using 
unique DNA snippets to tag antibodies paves 
the way for experiments surveying a theoreti- 
cally infinite number of proteins, Shalek says. 

Plus, such technologies enrich immune 
analyses by allowing researchers to also meas- 
ure the transcriptome — the complete set of 
expressed genes — in the same cells. Although 
it is less-established than mass cytometry, 
sequencing-based immune profiling can be 
done in any lab, hospital or clinic with a DNA 
sequencer, potentially expanding the technol- 
ogy’s reach. And researchers can now choose 
from at least half a dozen commercial systems 
and do-it-yourself methods for converting 
RNA and proteins from individual cells into 
strings of letters that can be read ona standard 
DNA sequencer. 

Collectively, these new technologies are 
fine-tuning scientists’ understanding of how 
the various different immune cells act in the 
microenvironment during disease states, says 
Sims. 


FLUORESCENCE TO METALS 

Flow cytometry is an optical technique that 
classifies cells on the basis of properties such as 
size, granularity and the presence of signature 


proteins labelled with fluorescent antibodies. 
‘Flow refers to the mechanics of the technique: 
cells flow in single-file past a series of lasers 
and detectors, which read them as they pass. 

The technique has been a staple of immu- 
nology for decades, but it has shortcomings. 
The visible-light spectrum limits most experi- 
ments to no more than a dozen or so protein 
markers — too few for analyses that involve 
small numbers of cells or complex phenotypes. 

In 2009, chemist Scott Tanner, then at 
the University of Toronto in Canada, came 
up with a solution’: mass cytometry (often 
called CyTOF) blends flow cytometry with 
mass spectrometry, using metal-conjugated 
antibodies to boost the number of detectable 
markers to 50 or so. 

The technique gained popularity after genet- 
icist Garry Nolan 


and his colleagues at “It’s avery 
California’s Stanford fe as t-evolving 
University used it to field with alot of 


measure 34 parame- innovation.” 
ters at once in human 
bone-marrow cells — helping them to track 
simultaneously how a wide variety of immune 
cells were responding to different drugs’. It 
“lets us dive into really complicated systems 
with single-cell heterogeneity and rare [cell] 
populations, and I don't need to know ahead of 
time where the action is happening’, says Sean 
Bendall, a Stanford biochemist who worked 
with Nolan on the profiling. Cy TOF was origi- 
nally commercialized by DVS Sciences, which 
was acquired by Fluidigm in South San Fran- 
cisco, California, in 2014. 

Researchers doing flow cytometry will typi- 
cally choose a small set of protein markers to 
stain each kind of immune cell — for example, 


A focus on immune receptors 


Trying to pinpoint which of the trillions of 
T-cell and B-cell receptors in the immune 
repertoire contribute to a person’s disease 
might seem like an impossible task. But 
specialized sequencing approaches are 
enabling researchers to give it a try. About 
half a dozen companies now offer services 
or kits for reading out the shuffled gene 
sequences of T-cell and B-cell receptors 
from cell samples. By tallying those 
sequences and watching how they change, 
researchers can track which clones of cells 
have key roles in disease. 

Adaptive Biotechnologies in Seattle, 
Washington, offers the whole process. 
Researchers can send in chunks of frozen 
tissue and pay for the DNA isolation. They 
can also submit purified DNA or RNA 
samples and have the company run the 
analyses and send back processed data 
— spreadsheets with lists of nucleotide 
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sequences identified in that run, and their 
frequencies. Other companies support a 
more do-it-yourself approach. ArcherDx in 
Boulder, Colorado; Takara Bio in Mountain 
View, California; iRepertoire in Huntsville, 
Alabama; and New England Biolabs in 
Ipswich, Massachusetts, sell multiplex PCR 
primers that researchers can use to amplity 
T- and B-cell receptor gene segments from 
their own samples. 

Each system has pros and cons. 
Some are great for frozen tissue, others 
for formalin-fixed, paraffin-embedded 
samples. Still others are sensitive to 
interference from tumour-cell DNA. One 
challenge is shared by all platforms: it 
can be hard to distinguish real biological 
changes, such as a small expansion of 
certain cells during an immune response, 
from the technical errors that arise from 
use of the enzymes. E.L. 


they may use CD19 to identify B cells, and CD4 
or CD8 for T-cell subsets. However, this pro- 
vides only a coarse level of resolution because 
the excitation and emission spectra of the fluo- 
rescent dyes overlap and muddy the picture. 
So when Asya Rolls, a neuroimmunologist at 
the Technion Israel Institute of Technology in 
Haifa, wanted a comprehensive look at all the 
immune-cell populations in the mouse brain, 
she decided to use CyTOF instead’. 

She surveyed 44 antibodies simultaneously, 
and the approach yielded a surprise: a T-cell 
subpopulation with unusually high expres- 
sion of the surface protein CD86 (ref. 4). CD86 
isn’t usually found on T cells in the rest of the 
body — it typically decorates the surface of 
other immune cells and acts to regulate T-cell 
activation — so the build-up in the brain was 
intriguing. Mass cytometry is a good tool for 
discovery, Rolls says. 

Once mass cytometry reveals which cells 
and molecules to focus on, however, a flow 
cytometer often proves more useful for fol- 
low-up analysis. Flow cytometry is faster: it 
can sort more than 10,000 cells per second, 
whereas mass cytometry handles only about 
1,000. CyTOF experiments also tend to use 
many more antibodies per sample, so any mis- 
takes are costlier. And if you will still need the 
cells after the analysis, CyTOF is not an option 
because it vaporizes them before reading their 
metal tags. 

But in its favour, mass spectrometry can 
target many more parameters per cell — 
including chemical compounds, collectively 
called the epigenome, that attach to DNA and 
influence how a cell uses its genetic instruc- 
tions. Targeting the epigenome, a Stanford 
team found last month that immune cells from 
older people have much greater epigenetic 
heterogeneity than do those of younger adults’. 
These data support the long-held assumption 
that the human immune system deteriorates 
with age because of cell-to-cell variations in 
gene expression. 


DNA-BASED PROFILING 

The initial wave of mass-cytometry papers in 
the early part of this decade caught the atten- 
tion of Adam Abate. Abate is a physicist at the 
University of California, San Francisco, who 
builds tools known as microfluidics devices, 
which are used to miniaturize the work and 
form a parallel with biological experiments. 
“I was reading Garry Nolan's papers,’ he says. 
“We had journal clubs on them.’ Those studies 
convinced Abate of the value of simultaneously 
analysing many parameters in the same cell, 
a process known as multiplexing. The advent 
of mass cytometry had already boosted the 
number of protein markers it was possible to 
analyse in one shot, from 10-15 to perhaps 
as many as 100. But Abate started asking, 
“why stop at a hundred? Why not go bigger?” 
Human cells have around 20,000 protein-cod- 
ing genes, and more than 100,000 protein vari- 
ants. “We need more multiplexing,” he decided. 
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Adaptive immune cells, such as this mature B cell, can show a staggering diversity of antigen receptors. 


So Abate looked at trying a completely 
different kind of molecular tag: short DNA 
sequences. The four-letter genetic alphabet 
can be used to create an enormous number of 
unique sequences, known as barcodes, for cells 
or molecules of interest. Abate stained cells 
with DNA-labelled antibodies and shuttled 
them into a machine that broke the cells open 
and spliced the antibody’s barcode to a second 
barcode that identifies the cell from which it 
came. He could then analyse those linked bar- 
codes using a DNA sequencer. To get single- 
cell resolution, he turned to microfluidics. His 
team developed a method called Abseq, which 
injects individual antibody-stained cells into 
10-micrometre droplets that contain unique 
DNA barcodes for both the antibody and cell°. 
Abseq works more slowly than flow cytometry, 
but it can survey a theoretically limitless num- 
ber of proteins in individual cells. And that 
could help to pinpoint which cells are driving 
immune activity in various diseases. This is 
especially important for diseases such as can- 
cer, in which each tumour has unique features 
that could engage — or suppress — different 
immune cells. 

However, the real power of Abseq lies in 
its potential to integrate with other single-cell 
methods, says Sam Kim, a postdoc in Abate’s 
lab who helped develop Abseq. Because it 
generates information that is readable by 
sequencing, it can assign multiple types of 
‘omics data — proteomes, transcriptomes, 
even disease-causing genetic mutations — to 
each cell, Kim says. 

Mission Bio, a biotechnology company 
in South San Francisco, licensed the Abseq 
technology to build just such a multi-omics 
system. At a meeting of the American Soci- 
ety of Human Genetics in Florida last Octo- 
ber, the company launched a first-generation 


system for analysing gene mutations, and it is 
now developing protein- and RNA-profiling 
components. BD Biosciences in San Jose, Cali- 
fornia, is developing similar technology: last 
September, it released its Rhapsody system for 
single-cell profiling of predefined or custom 
gene panels. Rather than probe the whole tran- 
scriptome, BD’s technology focuses on several 
hundred transcripts, which cuts the number 
of sequencing reads about tenfold, says Nikhil 
Rao, global product manager for the company’s 
Rhapsody system. “You save a tonne of money 
if you don’t waste sequencing on genes youre 
not interested in.” The company is scheduled 
to expand the Rhapsody platform to include 
protein detection with its own AbSeq assays 
— developed independently of Abate — later 
this year. 


DO-IT-YOURSELF APPROACHES 

Single-cell transcriptome analyses can also 
be used to profile the immune system. These 
methods offer “a tremendous opportunity 
to look genome-wide at what a cell is try- 
ing to express — what it’s thinking at a given 
moment’, says Shalek. 

Both commercial and homegrown options 
are available. In 2015, two teams at Harvard 
Medical School reported on their develop- 
ment of methods, called inDrop’ and Drop- 
seq’, which use microfluidics devices to 
analyse RNA expression genome-wide from 
thousands of single-cell droplets in parallel. A 
paper published in February” describes how 
to build a compact version of Drop-seq from 
3D-printed parts. 

On the commercial side, the California 
company 10X Genomics offers a high-end 
instrument that can convert cell samples into 
sequencing-ready data in a day, and Black- 
trace Holdings in Royston, UK, offers both an 
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all-in-one system and modular components 
for building your own. 

Still, droplet-based methods have drawbacks. 
They dontt work for clinical samples that con- 
tain onlya small number of cells. And reagents 
are wasted when loading cells at low densities to 
ensure only one goes into each droplet. 

Shalek and MIT chemical engineer 
Christopher Love worked through these 
issues to create a single-cell RNA-sequencing 
system called Seq-Well’®, which floats cells 
onto a silicone array containing 86,000 sub- 
nanolitre wells. “Think of it as a big ice-cube 
tray,’ Shalek says. BD’s Rhapsody system uses a 
similar approach, capturing cells in tiny wells. 

A key advantage of Seq-Well is its portabil- 
ity, Shalek says. Members of his lab have taken 
the device to Thailand to help MIT colleagues 
analyse cells infected by the malaria parasite 
Plasmodium vivax. They have also trained a 
team at the Africa Health Research Institute 
in Durban, South Africa, to use Seq-Well to 
identify infected immune cells in lymph-node 
samples taken from people with HIV. 

Strategies for collecting multiple omics data 
sets are moving forwards, too. Last year, for 
instance, two teams independently published 
methods — called REAP-seq'' and CITE-seq” 
— for analysing proteins and messenger RNA 
simultaneously in single cells. The New York 
University Langone Medical Center and the 
West German Genome Center are gearing up 
to offer similar services. Researchers can send 
us samples, “and we will do the single-cell anal- 
yses’, says Pratip Chattopadhyay, who directs 
the New York unit. The German centre, anew 
multi-institution collaboration headquartered 
in Bonn, is mobilizing a panel of experts for 
consultation on the technologies. 

“It’s a very fast-evolving field with a lot of 
innovation,’ says Joachim Schultze, a tumour 
immunologist at the German Center for 
Neurodegenerative Diseases in Bonn. “That 
also means nobody has any idea where this 
will be in ten years — which technologies will 
make it, or how many we will have.” m 


Esther Landhuis is a freelance science 
journalist in the San Francisco Bay Area, 
California. 
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Researchers who choose to study or work abroad must adapt to customs, hierarchies and expectations that can differ greatly from those they are used to. 


WORKING ABROAD 


Globetrot for science 


How to navigate cultural variances in a lab outside your native country. 


BY ROBERTA KWOK 


she attended a British school. She then 

completed her bachelor’s degree in statis- 
tics and a master’s in epidemiology in the United 
Kingdom. In both nations, she was accustomed 
to people being blunt. If teachers didn't like her 
work, they were upfront about it. “They told me, 
‘It's crap. Go back and do it again,” she says. 

So when Amratia started her PhD in malaria 
epidemiology in 2014, she did not sugarcoat 
her own opinions. During her first month at the 
University of Florida in Gainesville, her labora- 
tory met to discuss a paper that her supervi- 
sor was reviewing. Amratia recalls calling the 
paper “shit” and saying that it should not be 
published. A US lab mate described the paper's 


Pp unam Amratia grew up in Kenya, where 


positive attributes and outlined items that could 
be improved. Amratia’s colleagues suggested to 
her that she could be more diplomatic. After 
the meeting, she tried to temper her directness. 

But Amratia also faced problems socializing 
with her PhD-committee members, partly 
because she didn’t understand US popular- 
culture references. At a social event, while 
her colleagues laughed about a popular 1980s 
TV show, she stayed quiet because she was 
unfamiliar with it. Such incidents made it hard 
for her to establish a casual rapport with these 
faculty members, so she did not feel comforta- 
ble asking them for advice during her doctoral 
programme. 

Plenty of opportunities exist to study 
and work abroad. But some early-career 
scientists might face challenges adapting to 


different communication styles and different 
workplace and academic hierarchies. Super- 
visors and junior researchers can reduce the 
risk of misunderstandings by actively learning 
about each other's cultures and communicating 
workplace expectations clearly. 

It is important both to be sensitive to cultural 
differences and to avoid inadvertently stereo- 
typing. Nanda Dimitrov, director of Western 
University’s Teaching Support Centre in 
London, Canada, who has written about 
cross-cultural graduate supervision, says that 
it’s important not to make assumptions about 
students solely on the basis of their culture. A 
wealthy Chinese student from Hong Kong, 
for example, could see things differently from 
one who comes from a rural area on the main- 
land, she notes. And individual perceptions > 
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> can vary: Amratia says that she personally 
encountered directness more often in the 
United Kingdom than in the United States, 
but others might not have experienced this. 
Dimitrov points out that the relationship 
between junior researcher and supervisor is 
influenced by many factors including personal- 
ity, previous experiences and the department's 
and discipline’s workplace culture. 

Moving abroad to study or work has become 
more common, partly owing to encouragement 
from national governments and funding agen- 
cies. The number of students pursuing higher 
education abroad rose from 1.7 million in 
1995 to 4.1 million in 2013, according to the 
UNESCO Science Report: Towards 2030 (see 
go.nature.com/2wfvwyq), published in 2015 
by the United Nations Educational, Scientific 
and Cultural Organization (UNESCO) in 
Paris. European funders promote the prac- 
tice through programmes such as the Marie 
Sktodowska-Curie actions, which provide 
grants for researchers studying or working out- 
side their home countries. From 2003 to 2010, 
the Chinese government increased the number 
of scholarships for studies outside China from 
fewer than 3,000 to more than 13,000. Accord- 
ing to the UNESCO report, the regions with the 
highest proportions of students seeking higher 
education elsewhere are central Asia, Arab 
states, sub-Saharan Africa and Western Europe. 
The most popular destination for all outbound 
PhD students is the United States, where about 
half of international science and engineering 
PhD students are enrolled, followed by the 
United Kingdom, France and Australia. 

Many labs experience few problems with 
cultural differences. “Science workplaces are 
so international,” says Kaisa Kajala, a Finnish 
plant biologist at Utrecht University in the 
Netherlands who has studied or worked in 
the United Kingdom, Australia and the 
United States. “People are pretty understanding 
of different cultures.” 

When misreadings do arise, however, it’s 
important to address them, because the stakes 
are often high for international students. If 
research takes longer than expected, they 
might have difficulty extending visas or paying 
tuition fees. Principal investigators (PIs) who 
make invalid assumptions about a student’s 
intentions might write tepid letters of recom- 
mendation for job applications or decide not to 
collaborate with that person after graduation. 

Expectations for behaviour in areas such as 
leadership, communication and feedback style 
can vary across cultures. “While it is important 
never to stereotype, if we don’t try and under- 
stand how cultures differ, we are missing a huge 
part of what shapes how people act and how 
misunderstandings can arise,’ says Andrew 
Spencer, director of Rose Window Consult- 
ing near Oxford, UK, who has run training 
programmes for international businesses 
(including Springer Nature) on managing 
and communicating across cultural bounda- 
ries. For instance, he says, in the Netherlands, 
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negative feedback tends to be given clearly and 
unambiguously, whereas in Japan, criticism 
might be more indirect. Spencer recommends 
The Culture Map: Breaking Through the Invisible 
Boundaries of Global Business by Erin Meyer 
(PublicAffairs, 2014) for more examples. 

One point of difference that can arise is the 
appropriate level of deference to supervisors. 
Dimitrov says that some Nigerian, Egyptian 
and Chinese international students report 
that in their home nations, a large power 


differential between 
“Ifw e don’t try students and teachers 
is common, and that 

and understand d il 
how cultures students generally 
di follow instructions 
iff oF Weer without arguing. But 
missing a huge a supervisor from a 
part of what country where debate 
shapes how is expected might 


people act.” sometimes incor- 


rectly interpret a lack 
of questioning from the student as a lack of 
interest in the work, says Theresa Winchester- 
Seeto, an independent consultant on higher 
education in Sydney, Australia. 

Keshun Zhang faced this issue after moving 
from China to the University of Konstanz in 
Germany to pursue his PhD in psychology. He 
was used to following teachers’ suggestions. But 
“the culture in Germany always encourages you 
to argue, to fight for yourself”, says Zhang, now 
a postdoc at the university and co-author of the 
2016 book When a Chinese PhD Student Meets 
a German Supervisor: Tips for PhD Beginners 
(University of Konstanz). With his supervisor's 
encouragement, he started pushing back — for 
instance, ifhe thought data should be analysed 
using a different statistical method, he would 
say so. After his first year, his supervisor said, 
“Wow, in this one year, finally you have learnt 
to say no,” Zhang recalls. 

Zhang also realized that he was expected to 
work more independently than he had during 
his master’s programme in China. In Germany, 
he peppered a postdoc with frequent questions 
about issues such as statistical methods. His 
supervisor urged him to try to solve problems 
on his own and to seek guidance only if he 
became stuck. Zhang initially found this 
approach difficult but came to prefer it. 

The absence ofa strictly defined hierarchy 
can encourage freer communication, says 
Salim Reza, a radiation-detector scientist at 
Mid Sweden University in Sundsvall. When 
he moved from his native Bangladesh to 
Sweden for graduate studies, he learnt that he 
did not need to address faculty members as 
‘sir’ or ‘professor, or remain standing in their 
offices. This informality made it easier for him 
to approach professors to clarify a topic or to 
propose a new research angle. “I could go with 
the craziest idea to my teacher, and he would 
explain why it was good or bad,’ he says. 

Supervisors also can clarify expectations for 
the student’s responsibilities when they arrive. 
The Western Guide to Graduate Supervision 


(University of Western Ontario Teaching 
Support Centre, 2008) includes a rating scale 
that asks questions such as whether the student 
or supervisor will select the research topic and 
decide on methods. A student and PI could fill 
out the form together, discuss their answers and 
resolve discrepancies, Dimitrov says. 

Davor Solter, a mammalian-developmental 
biologist from Croatia who now splits his 
time between New Mexico and Maine, says 
that he never encountered problems owing to 
cultural differences as a PI in the United States, 
Germany and Singapore. His lab members’ 
countries of origin were irrelevant to their 
performance, he says. 


MIXED MESSAGES 
Sometimes, though, misunderstandings can 
stem from differences in communication style. 
In some countries, the ‘feedback sandwich’ 
is common, Dimitrov says: start with praise, 
suggest improvements and end with encour- 
agement. Students from countries where this 
format is less common might think that because 
comments were mostly positive, the suggestions 
are optional and can be ignored, she says. To 
avoid such mishaps, students could write an 
e-mail after each meeting summarizing the 
feedback and next steps so that the supervisor 
can correct their interpretation if needed. 
Conversely, a student who is accustomed 
to gentler feedback might be ‘traumatized’ 
by cultural tendencies in other countries, 
such as Germany or the Netherlands, to give 
more direct criticism, Dimitrov says. Senior 
researchers could smooth over differences by 
discussing how the student prefers to receive 
comments, she suggests. Students could also 
talk to lab mates about the feedback; hearing 
others’ stories of critiques — and positive out- 
comes, such as getting a paper accepted — could 
help them to overcome discouragement. 
Some researchers are surprised by their 
host country’s e-mail practices. Laetitia 
Wilkins, a postdoc in evolutionary biology 


Communication styles can vary across cultures. 
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at the University of California, Berkeley, 
says that in her native Switzerland, people 
typically reply to nearly every work-related 
message. “It’s almost mandatory that you 
answer, she says. In California, she found a 
more relaxed approach. When she e-mailed 
another researcher to request a data set and 
heard nothing, she wondered if she had 
done something wrong and the person did 
not want to collaborate. She did not senda 
follow-up message because, she says, that 
would be considered rude in Switzerland. 
Eventually she realized that ignoring 
or ‘losing’ e-mails was common in the 
United States. Now, ifa message goes unan- 
swered, she sends another e-mail a few days 
later and usually gets a friendly response. 

Other scientists find that the priorities 
attached to socializing differ from what they 
are used to. In Reza’s experience, people in 
Bangladesh do not usually take coffee breaks 
at work. But in Sweden, he realized that it 
was important to attend fika, coffee breaks 
that the department usually took each day. 
“We are obsessed with fika;’ he says. The 
gatherings provide opportunities to make 
social plans, hear what other groups are 
doing and discuss research issues. 

Some scientists welcome the change 
of pace offered by a fresh environment. 
Ecologist Christine Lucas moved from the 
United States to Uruguay for a postdoc 
position and is now a faculty member at 
the University of the Republic in Paysandu. 
She found that her new colleagues tended to 
keep a better work-life balance. When she 
became pregnant, she easily postponed the 
start of her postdoc by a few months; once 
she began, she had flexible hours, could 
sometimes work from home and designed 
a project that required little fieldwork. 
Friends who did postdocs in the United 
States seemed, to her, under more pressure 
to work with lab members face-to-face and 
to meet tough publication targets. And, she 
adds, it is acceptable for her department 
meetings in Uruguay to start 5-10 minutes 
late — allowing for unexpected obstacles 
such as transport strikes. 

Whether they are welcoming interna- 
tional students or starting work in new 
countries, scientists can ease the transition 
by remaining non-judgemental. People 
sometimes brush off a student from another 
country as ‘rude, but “in their culture, they're 
not’, Amratia says. Researchers should 
also remember that their nation’s customs 
aren't necessarily best. Solter says that his 
Croatian background helped him here: 
“When you come froma small country, you 
don’t assume everybody should be doing 
things your way,’ he says. “I never cared if 
somebody was different than me as long as 
it didn't seriously affect the rest of the lab” m 


Roberta Kwok is a freelance writer in 
Kirkland, Washington. 
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CAREERS 


Brain stimulator 


Materials scientist Bozhi Tian developed 
silicon-nanowire solar cells in 2007. Now, at 
the University of Chicago in Illinois, he has 
shown how similar nanoscale devices can be 
used to manipulate brain signals with light. 


Describe what your team’s latest study 

has achieved. 

We've shown that we can stimulate behaviour 
in a mouse by placing a nanoscale silicon 
mesh on the part of the brain that translates 
nerve impulses into movement, and then 
shining a light onto the mesh (Y. Jiang et al. 
Nature Biomed. Eng. https://doi.org/10.1038/ 
s41551-018-0230-1; 2018). For example, we 
were able to control the animal's left forelimb 
by flashing a beam at the right side of its brain. 


Why is this approach game-changing? 

Until now, there have been only two main 
methods of neurostimulation. The first 
involves implanting electrodes to produce 
neural activity. The second, called optogen- 
etics, requires cells to be genetically modified 
so that they can be controlled using light, 
but genetic modification is difficult and 
carries ethical concerns. Our approach uses 
a non-genetic neurostimulation device 
that enables distant nerves to be stimulated 
with light. It could have a big impact on the 
treatment of pain and other disorders. 


Has it led your research in a new direction? 
Yes, we now plan to do research on non-human 
primates — for example, exploring how to 
restore grasping functions after paralysis. 


Does the device have potential clinical 
applications? 

If we put the silicon mesh on the brain surface, 
we can elicit some activity deep inside the 
brain. And if you can stimulate the brain, you 
can treat certain neurological diseases or help 
a person to regain control over parts of their 
body. So it could be used to treat diseases such 
as Parkinson’s or disorders such as depression. 


Describe one of your breakthrough moments. 
My graduate student Ramya Parameswaran 
has demonstrated how a silicon nanowire can 
stimulate single cells (R. Parameswaran et al. 
Nature Nanotech 13, 260-266; 2018). Some of 
the gold that catalyses growth of the silicon 
wire becomes individual atoms that cover the 
wire’s surface. When we removed the atomic 
gold, some neurons couldn't be activated. We 
later found that the gold enhanced the silicon’s 
electrochemical properties and made it a better 
neurostimulator. 


You switched your research focus from energy 
to neurostimulation. What happened? 

Ileft China to start a PhD at Harvard University 
in Cambridge, Massachusetts, in 2004, working 
on the use of single nanowires in photovoltaics. 
We showed that silicon nanowires can convert 
sunlight into electricity, just as conventional 
solar cells do. However, they are also small 
enough to be integrated into nanodevices. After 
that work was published (B. Tian et al. Nature 
449, 885-889; 2007), I decided to completely 
change my research interest from energy to 
electrophysiology and bioscience. 


Why did you make that change? 

Biology offered lots of opportunity and room 
for exploration, especially in bringing nano- 
wires and neuroscience together. And I like to 
explore the unknown. So, for the second half 
of my PhD, I worked on developing a transistor 
that's small enough to be delivered into a single 
cell to record electrical activities. 


How was the transition? 

It was tough. I had no experience in cell culture, 
electrophysiology or working with animal 
tissues. I took a neuroscience course and learnt 
tremendously from that. Even so, there were 
setbacks. For example, my cell cultures kept get- 
ting contaminated. But I told myself that good 
moments would come if] persisted. 


How did you find collaborators? 

When I began at Chicago, some biologists were 
interested in collaborating; others werent. I 
gave a lecture to biophysics students, and Ana 
Correa, a biochemist and the wife of molecular 
biologist Francisco Bezanilla, attended. She 
introduced me to her husband afterwards, and 
he and I have since written grants and papers 
together. Sometimes you just need that right 
moment and right person. m 


INTERVIEW BY VIRGINIA GEWIN 


This interview has been edited for clarity and length. 
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UU asm SCIENCE FICTION 


DNA EXCHAN 


BY D. A. XIAOLIN SPIRES 


ard pile in the sink and watched 

as the auto-sorter pulled them 
into the dishwasher one by one, 
stacking them into supersonic 
racks. My mind reeled from the 
phone call from school. “Meiling 
was acting up again.” I mouthed start 
and the DishU hummed, giving off 
a scent of lemon in the process. The 
soothing movements of Smart DishU 
filled my eyes, mechanical limbs 
like spiders cartwheeling across my 
porcelain, singing them clean. 

I was lost in thought when I heard 
Securos deep voice announce: “Meil- 
ing has arrived.” The door slammed 
closed. I heard heavy footsteps run up 
the stairs. 

“Meiling, get over here,’ I yelled. 
The footsteps paused, then I heard them in 
reverse. Trudge, trudge. Down and across 
the living room. 

She had her baseball cap on, which flashed 
‘Just 4 UR Evite’ the newest band sensation. 
Who were they again? Josh, Kick and Enlai? 
Or was that last week’s boy band? 

“Take that thing off and look at me,’ I said. 
She shook her head, leaving a static holotrail 
ghost of the projection, her eyes hidden 
under the rim. 

I pulled off her cap. She shrieked. Her 
black hair flew in all directions. Teenage 
years, when would they be over? 

As she thrashed, saying it was unfair, 
I saw something jiggling off her neck, 
swaying with her angry convulsions. 

“What is that?” I pointed. It was flesh-col- 
oured and squishy, a nub. It looked like a dead 
naked mole rat, hunched in fetal position. 

Meiling’s eyes lit up, looking utterly 
delighted. Her tantrum stopped. 

Dangling off her neck on a chain was a 
deformed, pale locket. 

She picked up the piece, squishing it 
slightly. The pendant compressed under the 
pressure of her finger pads. 

“You know you say I should stop being so 
materialistic; Meiling said, smirking. “Well, 
I told Kyle that. He wanted to go steady with 
me and told me hed buy me anything. I said, 
‘I don't need you to buy me anything, but 

if you're so into me, 


[= the dishes in a haphaz- 


> NATURE.COM why dont you give me 
Follow Futures: your right ear?” 

© @NatureFutures She laughed. It 
Ei go.nature.com/mtoodm © was cruel, the sound 
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The course of true love. 


of my own narcissistic self echoing in my 
firstborn. 

“He said, ‘Ear? You mean, you want me 
to hear for you? I could install a chip and 
translate for you in Spanish class? And I said, 
‘No, I don't need you to do my schoolwork, 
thanks. I want your ear, literally? He smiled, 
thinking it was a joke.” 

“His ear,’ I said, incredulity cracking out 
of my voice. 

“Yeah, mom, keep up,’ she said, rolling her 
eyes. Her fingers worked her way into the 
seams and folds of the whorl of pale dough 
in her hands. 

“Well, he did it. He gave me his ear. I 
thought it was so romantic, like Van Gogh 
from art class.” 

“That’s his ear, right there?” I felt queasy, 
my lunch was stirring in my stomach. I saw 
it, the whorls, the lobe. It felt like the air 
sucked up all sound. Only the DishU going 
offin a hum. 

“No, mom, don't be dumb.” 

“Don't you call your mother that.” I 
reached for her pendant, but pulled back at 
the same time, unsure whether I wanted to 
touch the supple cartilage. 

“He grew it. Really, it had another purpose, 
for the science fair, but he grew an extra one 
for me. On his arm, hed been wearing long 
sleeves even in this June heat, because he 
wanted to surprise me. They sprouted on 
his forearm, right above the wrists.” 

She pointed at her own wrists. I scoured 
them, checking for any weird nubs. 

“That doesn't make it much better.” 

“He said it doesn’t have the organs to 


E 


actually hear yet, but he’s working on 
the eardrum and seeing if he can project 
aural interpretation into his room. That 
way we can be connected.” 

I know teens have their own court- 
ship rituals, but this was out of control. 

“Give me that.” I pulled at her chain. 

“No!” she screamed. She seized the 
chain. “This is his promise necklace” 

We struggled for five minutes. 

She shouted, “You could never under- 
stand me” and “Why did I even bother 
telling you” and “I should've just lied.” 

Iyelled, “You just like to act out to get 
my attention” and “Well, now you have 
it and you better get rid of that gross 
piece of human meat off your chest.” 
The fresh scent of lemon continued to 
fill the room, a strength escalating in 
tune with our voices. 

“Mom, you'll never understand. You 
never lived with DNA exchanges and 
flitskin-tats, you'll never know how much 
this means!” She threw herself around to 
rush out. 

In her haste, her shirt behind her flew up 
and for a second, I saw an orb-like lump. 

“What is that?” I asked, my voice now low, 
unusually calm. 

Meiling knew that tone. That was the 
tipping point. If she pushed me over, shed 
be grounded for weeks. No holocalls, no 
late-night trips to dessert shack, no e-cart 
creds. 

“Tt’s, uh, a decorative thing. Temporary.” 
She pulled her shirt down. 

I walked over and took a look. Her lips 
were quivering. 

It was an eye. A single eye, protruding 
right above her hips, resting on her lower 
back like a tramp stamp. 

“No, no, no...” 

“Well, you know. I can't leave him hanging. 
Love goes both ways. And he always said what 
beautiful eyes I had,’ she said, squirming. 

I looked into the pupil, which followed 
my face. It blinked. Its sinewy veins looked 
like DishU mechanical legs, spidery threads 
spreading across her back. I grabbed the 
holophone and mouthed “Dermatologist”, 
wondering how much this removal surgery 
would set us back. = 


D.A. Xiaolin Spires spins in tune to the 
cosmos. Work in Clarkesworld, Analog, 
Fireside, Grievous Angel, Galaxy's Edge, 
LONTAR, Terraform, Sharp & Sugar Tooth 
and Broad Knowledge. Web: daxiaolinspires. 
wordpress.com Twitter: @spireswriter. 
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Building the future 


Hong Kong is using its proximity to science hubs in mainland China 
to encourage innovation, but barriers to collaboration remain. 


earmarked 50 billion Hong Kong dollars 

(US$6.37 billion) for investment in scien- 
tific and technological innovation. The move 
signalled a desire for tighter connections with 
China's new science-based economy, and wish 
to diversify Hong Kong’s own. 

Stephen Phillips runs a government depart- 
ment called InvestHK that promotes invest- 
ments into the territory from the rest of the 
world. He hailed the budget announcement as 
“tangible evidence of the government's deter- 
mination to position Hong Kong well ina very 
competitive landscape”. 


I: March this year, Hong Kong officials 


BY JACK LEEMING 


This is a divergence from the history of the 
region. Since 1997, when the UK government 
returned the territory to China after its 99-year 
lease expired, Hong Kong has relied mainly on 
the historic strengths of its banking and prop- 
erty sectors to drive its economy (it has the 
world’s least-affordable housing market). This 
has left it vulnerable to shifting economic tides 
suchas the 1997 Asian financial crisis. The rest 
of China, meanwhile, pivoted towards science 
and technology to bolster its economy, which 
was previously known for low-cost manufac- 
turing on a vast scale. 

Shenzhen, for example, a city just to the 


north of Hong Kong, which itself sits on a nar- 
row peninsula in the Guangdong province 
of southern China, currently commits more 
than 4% of its gross domestic product (GDP) 
to research and development (R&D), much 
of that from private companies. Shenzhen is 
home to vast science firms, including BGI, the 
world’s largest genomics company, and DJl, 
the world’s market leader in drone technology. 
Hong Kong dedicated only 0.79% of its GDP 
to R&D in 2016. 

Increasingly, collaboration with the main- 
land, especially Shenzhen and the surrounding 
metropolitan area, is the key to maintaining > 
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» Hong Kong’s economic strength in the 
future, says Matthew Evans, who runs the fac- 
ulty of science at Hong Kong University. 

The Pearl River Delta, of which the terri- 
tory is a part, is home to 66 million people. 
The region is named after the eponymous 
waterway that splits the area in half and forms 
Hong Kong island. “If you lift Hong Kong 
out and put it in the middle of the Pacific 
Ocean, will it still perform? Obviously not,” 
explains Andrew Leung, a business consult- 
ant on Chinese issues, who is based in the city. 
“Hong Kong must link to somewhere.” 

Before the increased budget, Hong Kong 
had already had some success in science 
through its existing infrastructure. Last month, 
for instance, the company SenseTime was 
crowned the world’s most valuable artificial- 
intelligence (AI) start-up, following an invest- 
ment from Alibaba, the e-commerce giant, of 
more than US$600 million. 

Originally conceived at the Chinese Univer- 
sity of Hong Kong (CUHK), SenseTime trans- 
ferred into the territory’s Science Park system, 
a government-financed incubator that offers 
subsidized benefits to start-ups. 

SenseTime’s speciality is using AI technol- 
ogy for facial recognition, and is regularly 
criticized by privacy campaigners. In the 
mega-metropolis of Guangzhou, also part 
of the Pearl River Delta, this method is used 
by local police to track the faces of suspected 
criminals. The company says it has helped to 
solve at least 100 cases in the city. 

As well as the budget announcement, Leung 
highlights other movements to spur scientific 
research in the territory in recent years, includ- 
ing the opening of the Innovation and Tech- 
nology Bureau. It was established in 2015 and 
is headed by Nicholas Yang, who was formerly 
vice-president of Hong Kong Polytechnic 
University. “The question,’ says Leung, “is how 
to marry research with businesses.” 


WAYS TO COLLABORATE 

Of the 50 billion Hong Kong dollars from the 
city’s budget, 20 billion will be spent develop- 
ing a physical manifestation of the spirit of 
collaboration. The Lok Ma Chau loop — an 
area of farmland between Shenzhen and 
Hong Kong — has been tagged by both cities’ 
officials as the site of a science and technol- 
ogy park. Officials say the park will be four 
times the size of current facilities, and it will 
be managed by the Innovation and Technol- 
ogy Bureau. Feedback is varied: critics have 
labelled it a government scheme to buy and 
develop property rather than anything truly 
committed to promoting R&D. 

Others disagree. Representatives for the 
science park declined to be interviewed for this 
article, but from 2016 to 2017 the number of 
start-ups enrolled in its incubator programme, 
which offers technical and professional sup- 
port and subsidized rents to science busi- 
nesses, increased by 8.4% from the one-year 
period before. 
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Biochemist Vasu Saini, who moved to the 
city from India in 2016 to start his PhD at 
Hong Kong University of Science and Technol- 
ogy (HKUST), says he can see the value of the 
science park when he considers the jobs avail- 
able to his peers. “Hong Kong is driven mostly 
by the economic sector so there are not a lot of 
jobs for scientists, he says. “But now there's a 
science park, there are more opportunities for 
people with that background” 


COLLABORATIVE SPIRIT 

In 2010, BGI established a facility in Hong 
Kong to supplement its existing sites in 
Shenzhen and form the basis of a collabora- 
tion with CUHK. This was a welcome addi- 
tion to the employment landscape for some. 
“T didn't really know if I could find a science 
job in Hong Kong at first, but there are a lot 
of new companies and there’s a lot of new 
science here,” says Irene Chik, who was born in 
Hong Kong, and manages a lab in the facility. 
Her family emigrated to Canada in 1996 and 
she returned to Hong Kong in 2013. 


POLITICS 


IS ANOTHER BARRIER 
TO FURTHER 


COLLABORATION 


AND 


INTEGRATION. 


Academic work is also crossing territorial 
and cultural borders. Although Hong Kong 
has five universities in the top 100 QS rank- 
ings for 2018, “there's not normally much ofa 
market for the basic research that Hong Kong 
universities conduct’, says Naubahar Sharif, 
who researches innovation and technology 
policy at HKUST. He says that collaboration 
with companies elsewhere in China might help 
to transfer that basic research into commercial 
applications, and he adds that collaboration 
with Shenzhen “allows the university sector in 
Hong Kong to have an outlet”. 

One example of this comes from DJI, which 
was founded in 2006 and, like SenseTime, 
was conceived in a Hong Kong university 
and expanded into Shenzhen when the time 
came to scale up production. Kei May Lau, an 
electronics engineer at HKUST, worked in the 
lab next door to robotics engineer Zexiang Li, 
who was an early adviser and investor at DJI. 
One day, Li showed her some of the images 
hed taken with a drone hed made with Frank 
Wang — now the billionaire chief executive of 
DJI. The company’s drones have become ubiq- 
uitous worldwide, especially in film-making 
and photography. “There were these scenic 


mountains that they climbed up” and flew the 
drone over, says Lau. “T said, “Wow. In the old. 
days, if National Geographic wanted to do this, 
theyd have to spend thousands of dollars on 
a helicopter to get there. Now, they can take 
everything they need in a backpack” 

Other universities are sitting up and tak- 
ing notice of these and other commercial 
successes, and a rush to work with Chinese 
collaborators is starting in earnest, partially 
facilitated by incentives from the government 
in mainland China to encourage collaboration 
through the Hong Kong-Shenzhen border 
and across academic-industry barriers. 
These include giving researchers access 
to federal funds previously unavailable to 
scientists in Hong Kong, a development pub- 
licized in Chinese state media this month. An 
older scheme to share the costs and results of 
applied research with private-sector partners 
is another positive. 

Lau says that this has resulted in funding 
in Hong Kong being skewed towards applied 
research. One of her projects is funded by the 
Innovation and Technology Bureau, which 
insists that scientists must match their funding 
with the same amount of investment from the 
private sector. “They want to make sure you're 
not doing pie-in-the-sky research,’ she says. 

Companies in China and universities in 
Hong Kong can both find natural benefits in 
collaboration, says Sharif. “Shenzhen and the 
nine or so other cities in the Pearl River Delta 
have been called the workshop of the world, 
and they have expertise in manufacturing 
which Hong Kong doesn’t — but Hong Kong 
has these world-class universities which the 
delta doesn’t have.” 

Evans sees the advantages for institutions 
such as his own, but says that any significant 
collaboration will take an investment of time 
and good faith. 

“One of the challenges for this university 
and for the rest of Hong Kong is to make these 
collaborations work,’ he says. “We're part of the 
Pearl River Delta too, and we should be one 
of the driving forces on the academic side of 
that. IfI have anything to do with it, we will be” 

“That’s only going to work if the academic 
sector engages with the business sector,” he 
says. “I’m spending increasing amounts of my 
time going over into that area, into mainland 
China, and talking to companies and people 
there about how we can work together.’ 


FEELING FAMILIAR 

Phillips points to Hong Kong's Western-style 
legal framework and intellectual-property 
laws — hangovers from its history as a British 
colony that were enshrined in the ‘one coun- 
try, two systems’ agreement drafted in the 
reunification process with China. Overseas 
investors are comfortable spending in Hong 
Kong because it is culturally accessible, he 
argues, unlike the Chinese mainland, where 
many feel that the financial risk is too great 
and the landscape too unfamiliar, and where 
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The China-Hong Kong Nature Science Project in December 2015. 


foreign companies often find themselves 
mired in bureaucracy. That works both ways. 
Hong Kong is also a place for Chinese busi- 
nesses — in science and elsewhere — to inter- 
nationalize. Vivek Wadhwa, a distinguished 
fellow at Carnegie Mellon University’s campus 
in Mountain View, California, and a column- 
ist on scientific innovation, agrees. “The West 
doesn’t trust Chinese companies. But it does 
trust Hong Kong’s laws,’ he says. “This solves 
many problems for the West.” 

Culturally, too, Hong Kong has draws for 
Western investment and talent that mainland 
China lacks. “IfI take you and drop you down 
in most Chinese cities, the road signs to the 
food to the language to the culture will be over- 
whelmingly Chinese,’ says Evans. “Hong Kong 
is a more accessible city for westerners.” 

David Zweig studies Chinese diplomacy 
and the country’s efforts to attract researchers 
at the Center on China's Transnational Rela- 
tions at HKUST, which he runs. He says that 
there are more differences between China and 
Hong Kong when it comes to their academic 
sectors. “The level of academic freedom from 
bureaucratic interference is preferable in Hong 
Kong. You don't need to kiss ass to govern- 
ment administrators so much, he says. He 
says that returnees, people born in China who 
studied and worked elsewhere and have since 
returned, complain that the bureaucrats “have 
too much power in the mainland; if you want 
to get access to research money in your first 
couple of years, it’s not easy.” 

There are still barriers to collaboration, 
however. Wadhwa says that although Hong 
Kong might have aligned itself with China’s 
science and technology priorities, this is not 
necessarily enough to maintain economic 
growth. “Yes, there might be opportunities to 
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collaborate with places like BGI in Shenzhen,” 
he says, “but so what? That doesn't build a 
new economy for Hong Kong. That's not true 
innovation” 

Politics is another barrier to further collabo- 
ration and integration. Many of Hong Kong’s 
citizens remain protective of their rights to 
vote, publish and march against the govern- 
ment's interests — rights that those outside the 
city do not have. The success of data compa- 
nies such as SenseTime, which critics say will 
cement China as a surveillance state, does little 
to alleviate common fears about what life will 
be like as we approach 2047 — the due date 
for full integration with China. 


PROGRESS 

Whatever happens politically, it’s likely that sci- 
entists in Hong Kong and the rest of the Pearl 
River Delta will draw closer together as time 
goes on — and many in the scientific world 
generally see this as a positive. “When you look 
at what's happening in this neck of the woods, 
it's about complementarity,’ says Phillips. “It's 
not about one city being set up against another. 
That’s not how businesses see it. They look at 
how they can deliver business.” 

Evans is confident that collaborations 
across the border will strengthen in academia, 
too. “We have to work hard to make it happen 
now, he says. “If it doesn't happen now, we're 
going to lose our place.” Fortunately, Hong 
Kong is a city used to swift transformation, 
Chik says, especially when compared with her 
previous home of Vancouver. “It’s fast-paced 
and ever-changing, and if you don't change 
fast enough, then all these competitors — they 
will overtake you.” = 


Jack Leeming is the editor of Naturejobs. 
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